Security Vulnerability Disclosure (Fixed!): Delegator fees could be lost due to rounding errors

Summary

A security vulnerability was recently discovered after members of the core development team observed cases where delegators did not receive any fees after their transcoder redeemed a winning ticket. Members of the core development team analyzed the vulnerability to understand its impact, implemented a fix, validated the fix and completed deployment of the fix around 6:00pm EST Tuesday 2/9/21.

The vulnerability causes delegators to lose out on fees shared by their transcoder due to a rounding error in the math calculations introduced by LIP-36. While some rounding errors are expected due to the fixed amount of precision that can be supported in the contracts, in this case, the rounding error was significant, unexpected and in practice resulted in many cases where delegators would not receive fees despite their transcoders redeeming winning tickets worth upwards of 0.17 ETH (which is a non-trivial amount). The vulnerability has been fixed and users do not need to worry about the vulnerability impacting them going forward. The technical details of the vulnerability are shared below for informational purposes.

Technical Details

Key Changes

Add support for higher precision cumulative factor math operations

Without this change, delegators can end up earning 0 fees when:

delegatorsFees < totalStake / previousCumulativeRewardFactor

where delegatorsFees is the fees for delegators from a winning ticket, totalStake is the total stake of the transcoder and previousCumulativeRewardFactor is the transcoder’s cumulativeRewardFactor in the previous round.

Consider the following examples where we assume previousCumulativeRewardFactor = 1000000 which is the default value if the transcoder has never earned rewards before and calculate the minimum delegatorsFees value for delegators to earn > 0 fees:

  • totalStake = 100kdelegatorsFees = 0.1 ETH
  • totalStake = 200kdelegatorsFees = 0.2 ETH
  • totalStake = 500kdelegatorsFees = 0.5 ETH
  • totalStake = 1mdelegatorsFees = 1 ETH

In practice, previousCumulativeRewardFactor may be greater than 1000000 which would reduce the minimum delegatorsFees in these examples, but not substantially. So, these examples that assume previousCumulativeRewardFactor = 1000000 can be considered as worse case scenarios. Given the inequality presented earlier, we can see that a substantially higher default value for previousCumulativeRewardFactor would result in a substantially lower minimum delegatorsFees if totalStake is held constant.

In order to substantially increase the default value for previousCumulativeRewardFactor we introduced a PreciseMathUtils library that uses PERC_DIVISOR = 10 ** 27 for all scaled math operations instead of the PERC_DIVISOR = 1000000 used by the existing MathUtils library. Usage of PreciseMathUtils results in 27 decimal places of precision instead of the 6 decimal places supported by MathUtils. PreciseMathUtils is used for all cumulative factor math operations and the default value for previousCumulativeRewardFactor is also set to PERC_DIVISOR = 10 ** 27. With these changes, the minimum delegatorsFees value for delegators to earn > 0 fees is updated to:

  • totalStake = 100kdelegatorsFees = 0.0000000000000000000001 ETH
  • totalStake = 200kdelegatorsFees = 0.0000000000000000000002 ETH
  • totalStake = 500kdelegatorsFees = 0.0000000000000000000005 ETH
  • totalStake = 1mdelegatorsFees = 0.000000000000000000001 ETH

All math operations that deal with fractions in the BondingManager are also updated to use PreciseMathUtils. The remaining math operations in the BondingManager that still use MathUtils (and thus PERC_DIVISOR = 1000000) are the reward cut and fee share related calculations mainly because the reward cut and fee share values are stored assuming a 1000000 as the maximum percentage value.

A few notes on the choise of PERC_DIVISOR = 10 ** 27:

  • DSMath (used by MakerDAO) and Synthetix both use 27 decimal places for high precision operations
  • While increasing the value of PERC_DIVISOR does decrease the maximum value that can be scaled (i.e. multiplied) by PERC_DIVISOR before an integer overflow, the maximum value is still a really large integer around 115792089237316195423570985008687 * (10 ** 18) so in practice integer overflows should not be a concern

Sets a LIP-71 round

There needs to be a designated round in which the changes outlined here go into effect. Since the next Github issue to be created in the LIP repo would be issue 71, we consider the changes outlined here to be a part of LIP-71.

Rescale cumulative factors stored prior to the LIP-71 round to use a higher precision value

All cumulative factors stored before the upgrade will be scaled using PERC_DIVISOR = 1000000. Starting at the LIP-71 round, stored cumulative factors will be scaled using PERC_DIVISOR = 10 ** 27. The problem is that the cumulative factors in different rounds are used in calculations and they need to be scaled by the same value. So, after the LIP-71 round, if the BondingMangager looks up the cumulative factor value stored for a round prior to the LIP-71 round the value will be rescaled with the following formula:

value * (10 ** 27) / 1000000
# Simplied to:
value * (10 ** 21)

Multiplying by 10 ** 27 ensures that the value is scaled by the higher precision PERC_DIVISOR and dividing by 1000000 ensures that the previously applied scaling by the lower precision PERC_DIVISOR is removed.

Scale cumulative factors by a lower precision value before the LIP-71 round and by a higher precision value after the LIP-71 round

A problem with immediately scaling stored cumulative factors by the higher precision PERC_DIVISOR after the upgrade is that the BondingManager will not know whether cumulative factors stored in the current round should be rescaled (see previous section). If the cumulative factors were stored in the current round before the upgrade, they would be scaled using the lower precision PERC_DIVISOR and if they were stored in the current round after the upgrade they would be scaled using the higher precision PERC_DIVISOR. We would need to only rescale the former.

To solve this problem, we can set the LIP-71 round to be strictly after the round that the upgrade takes place and continue scaling cumulative factors stored prior to the LIP-71 round using the lower precision PERC_DIVISOR. Then, after the LIP-71 round, we would start scaling using the higher precision PERC_DIVISOR. The benefit of this approach is that after the LIP-71 round the BondingManager can then always know that cumulative factors for rounds prior to the LIP-71 round need to be rescaled.

Move Manager modifier logic into internal functions to reduce BondingManager bytecode size

Without this change, the BondingManager bytecode size exceeds the hard limit of 24576 bytes. The Solidity compiler currently inlines all code in a modifier in the function that the modifier is applied to. So, if a modifier is applied to N functions, the code in the modifier will be duplicated N times. A workaround that avoids duplicating the modifier code is to move the code into an internal function and then invoke the internal function in the modifier. Code in an internal function is not inlined at its call site and instead when the internal function is called, the calling code will jump to the start of the internal function in the contract bytecode.

The BondingManager inherits from the Manager base contract which defines modifiers that are used throughout the BondingManager (i.e. whenSystemPaused, whenSystemNotPaused). By moving the Manager modifier logic into internal functions we are able to shave off some bytecode to keep the BondingManager bytecode size under the hard limit.

# Outputs the bytecode size of the BondingManager
cat build/contracts/BondingManager.json | jq '.deployedBytecode' | wc -m | xargs -I {} echo "{} / 2 - 1" | bc
23311

At the moment, the BondingManager bytecode size is still below the hard limit of 24576 bytes.

Deployment

LIP_71_ROUND is set to 2053 which was the round after the round in which these changes were deployed.

  1. Deploy a new BondingManager target implementation contract
  2. Stage an update with the Governor that calls setLIPUpgradeRound(71, LIP_71_ROUND) on the RoundsManager and registers the new BondingManager target implementation with the Controller
  3. Execute the staged update to set the LIP-71 upgrade round to LIP_71_ROUND in the RoundsManager and to complete the registration of the new BondingManager target implementation

For More Information

If you have any questions or comments about this advisory:

It’s great that even the emergency vulnerability fixes technically do go through LIP process @yondon

Wondering how community could support in these circumstances, or even have visibility onto the process as it happens, to act as some kind of observer?

It’s great that even the emergency vulnerability fixes technically do go through LIP process

The past few bug fixes were assigned LIP numbers for clarity, ease of description and so that the round at which a fix was deployed could be referenced within the contracts using the LIP number (for example, for LIP-71 there needed to be different contract logic executed depending on whether the round number is before or after the LIP-71 round).

I think these can be considered as emergency LIPs that are only created for security vulnerability responses and in these circumstances due to the time sensitive nature of the bug fix the LIP does not go through the typical LIP acceptance process. This is not a formal concept documented in the LIP repo right now, but I think it’d be a good idea to add it and to also map these security vulnerability disclosure forum posts to LIP documents in that repo. Welcome any feedback here in a separate forum post on the topic.

Wondering how community could support in these circumstances, or even have visibility onto the process as it happens, to act as some kind of observer?

This is a bit tricky given that additional public visibility necessarily means potential increased awareness of the bug before it is fixed. These security vulnerability disclosures make an effort to document all the steps that were taken retroactively so community members can have a transparent view of what changed in order to fix the bug. Open to other suggestions in a separate thread.