Math in Solidity (Part 2: Overflow)

This article is the second one in a series of articles about doing math in Solidity. This time the topic is: overflow.

Mikhail Vladimirov
Coinmonks
Published in
8 min readFeb 21, 2020

--

Introduction

Every time I see +, *, or ** doing audit of another Solidity smart contract, I start writing the following comment: “overflow is possible here”. I need a few seconds to write these four words, and during these seconds I observe nearby lines trying to find a reason, why overflow is not possible, or why overflow should be allowed in this particular case. If the reason is found, I delete the comment, but most often the comment remains in the final audit report.

Things aren’t meant to be this way. Arithmetic operators supposed to allow writing compact and easy to read formulas such as a**2 + 2*a*b + b**2. However, this expression would almost definitely raise a bunch of security concerns, and the real code is more likely to look like this:

Here add, mul, and pow are functions implementing “safe” versions of +, *, and ** respectively.

Concise and convenient syntax is discouraged, plain arithmetic operators are marginally used (and not more than one at a time), cumbersome and unreadable functional syntax is everywhere. In this article we analyse the problem, that made things so weird, whose infamous name is: overflow.

We Took a Wrong Turn Somewhere

One would say, that overflow was always there, and all programming languages suffer from it. But is this really true? Did you ever see something like SafeMath library implemented for C++, Python, or JavaScript? Do you really think, that every + or * is a security breach, until the opposite is proven? Most probably, your answer for both questions is “no”. So,

Why Overflow in Solidity Is So Much Painful?

Spoiler: nowhere to run, nowhere to hide.

Numbers do not overflow in pure math. One may add two arbitrary large numbers and get precise result. Numbers do not overflow in high-level programming languages such as JavaScript and Python. In some cases the result could fall into infinity, but at least adding two positive numbers may never produce negative result. In C++ and Java integer numbers do overflow, but floating-point numbers don’t.

In those languages, where integer types do overflow, plain integers are used primarily for indexes, counters, and buffer sizes, i.e. for values limited by the size of data being processed. For values, that potentially may exceed range of plain integers, there are floating-point, big integer, and big decimal data types, either built-in or implemented via libraries.

Basically, when the result of an arithmetic operation does not fit into the type of the arguments, there are a few options what compiler may do: i) use wider result type; ii) return truncated result and use side channel to notify the program about overflow; iii) throw an exception; and iv) just silently return truncated result.

The first option is implemented in Python 2 when handling int type overflows. The second option is what carry/overflow flags in CPUs are for. The third option is implemented for Solidity by SafeMath library. The fourth option is what Solidity implements by itself.

The fourth option is probably the worst one, as it makes arithmetic operations error-prone, and at the same time makes overflow detection quite expensive, especially for multiplication case. One needs to perform additional division after every multiplication to be on the safe side.

So, Solidity neither has safe types, one could run to, nor it has safe operations, one could hide behind. Having nowhere to run and nowhere to hide, developers have to meet overflows face to face and fight them all throughout the code.

Then, the next question is:

Why Doesn’t Solidity Have Safe Types Nor Operations?

Spoiler: because EVM don’t have them.

Smart contracts have to be secure. Bugs and vulnerabilities in them cost millions of dollars, as we’ve already learned the hard way. Being the primary language for smart contracts development, Solidity takes security very seriously. If has many features supposed to prevent developers from shooting themselves in the feet. We mean features like payable keyword, type cast limitations etc. Such features are added with every major release, often breaking backward compatibility, but the community tolerates this for the sake of better security.

However, basic arithmetic operations are so unsafe that almost nobody use them directly nowadays, and the situation doesn’t improve. The only operation that became a bit safer is division: division by zero used to return zero, but now it throws an exception, but even division didn’t become fully safe, as it still may overflow. Yes, in Solidity int type division overflows when -2¹²⁷ is being divided by -1, as correct answer (2¹²⁷) does not fit into int. All other operations, namely +, -, *, and ** are still prone to over- or underflow and thus are intrinsically unsafe.

Arithmetic operations in Solidity replicate the behavior of corresponding EVM opcodes, and making these operations safe at compiler level would increase gas consumption by several times. Plain ADD opcode costs 3 gas. The cheapest opcode sequence for implementing safe add the author of the article managed to find is:

Here <overflow> is the address to jump on overflow. Numbers in brackets are gas costs of the operations, and these numbers give us 28 gas in total. Almost 10 times more, than plain ADD. Too much, right? It depends on what you compare with. Say, calling add function from SafeMath library would cost about 88 gas.

So, safe arithmetic at library or compiler level costs much, but

Why Doesn’t EVM Have Safe Arithmetic Opcodes?

Spoiler: for no good reason.

One would say that arithmetic semantic in EVM replicates that of CPU for performance reasons. Yes, some modern CPUs have opcodes for 256-bit arithmetic, however mainstream EVM implementations don’t seem to use these opcodes. Geth uses big.Int type from the standard library of Go programming language. This type implements arbitrary wide big integers backed by arrays of native words. Parity uses its own library implementing fixed-width big integers on top of native 64-bit words.

For both implementations, additional cost of arithmetic overflow detection would virtually be zero. Thus, once EVM would have versions of arithmetic opcodes, that revert on overflow, their gas cost could be made the same as for existing unsafe versions, or just marginally higher.

Even more useful would be opcodes that do not overflow at all, but return the whole result instead. Such opcodes would permit efficient implementation of arbitrary wide big integers at compiler or library level.

We don’t know why EVM doesn’t have the opcodes described above. Maybe just because other mainstream virtual machines don’t have them?

So far we were telling about real overflow: a situation when calculation result is too big to fit into the result data type. Now it is time to discover the other side of the problem:

Phantom Overflows

How one would calculate 3% of x in Solidity? In mainstream languages one just writes 0.03*x, but Solidity doesn’t support fractions. What about x*3/100? Well, this will work in most cases, but what if x is so large, that x*3 will overflow? From the previous section we know what to do, right? Just use mul from SafeMath and be in the safe side: mul (x, 3) / 100… Not so fast.

The latter version is somewhat more secure, as it reverts where the former version returns incorrect result. This is good, but… Why on earth calculating 3% of something may ever overflow? 3% of something is guaranteed to be less that original value: in both, nominal and absolute terms. So, as long as x fits into 256-bit word, then 3% of x should also fit, shouldn’t it?

Well, I call this “phantom overflow”: a situation when final calculation result would fit into the result data type, but some intermediate operation overflows.

Phantom overflows are much harder to detect and address than real overflows. One solution is to use wider integer type or even floating-point type for intermediate values. Another is to refactor the expression in order to make phantom overflow impossible. Let’s try to do the latter with our expression.

Arithmetic laws tell us that the following formulas should produce the same result:

However, integer division in Solidity is not the same as division in pure math, as in Solidity it rounds the result toward zero. The first two variants are basically equivalent, and both suffer from phantom overflow. The third variant does not have phantom overflow problem, but is somewhat less precise, especially for small x. The fourth variant is more interesting, as it surprisingly leads to a compilation error:

We already described this behavior in our previous article. To make the fourth expression compile we need to change it like this:

However this does not help much, as the result of corrected expression is always zero, because 3 / 100 rounded towards zero is zero.

Via the third variant we managed to to solve phantom overflow problem at the cost of precision. Actually, precision loss is significant only for small x, while for large x it is negligible. Remember, that for the original expression, phantom overflow problem arises for large x only, so it seems that we may combine both variants like this:

Here SOME_LARGE_NUMBER could be calculated as (2²⁵⁶-1)/3 and rounding this value down. Now for small x we use original formula, while for large x we use modified formula that do no permit phantom overflow. Looks like we solved phantom overflow problem without significant loss of precision now. Great work, right?

In this particular case, probably yes. But what if we need to calculate not 3%, but rather 3.1415926535%? The formula would be:

Our SOME_LARGE_NUMBER will become (2²⁵⁶-1)/31415926535. Not so large then. And what about 3.141592653589793238462643383279%? Being good for simple cases, this approach does not seem to scale well.

Conclusion

EVM does not offer overflow-protected opcodes. Solidity doesn’t offer any overflow protection at compiler level. Thus smart contract developers have to address overflows at code level, which makes code cumbersome and less gas efficient.

Phantom overflows are even harder to detect and address. Straightforward lead to tradeoffs and often do not scale.

In our next article, we will suggest better approaches for addressing the phantom overflow problem, and the topic of our next article will be: percents and proportions.

--

--