5F3759DF: An explanation of the world's most infamous magic number

March 17, 2021

Here's a famous function in the Quake III source code. Can you guess what it does?

float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}

This function calculates the inverse of the square root of a number.

Q_rsqrt(number)=1numberQ\_rsqrt(number) = \frac{1}{\sqrt{number}}

Link to this section Preliminary Explanation

Link to this section Why?

Quake III needs to perform a lot of calculations to simulate lighting. Each face of a polygon in the scene is represented by a 3D vector pointing perpendicular to it. Then, these vectors are used to calculate the reflections of light off of the face.

Each face could be represented by a vector of any length--the important information is the direction of the vector, not the length of the vector. However, the calculations become much more simple if all of the vectors have length 1. Thus, we need to normalize these vectors. We can do this by dividing the vector by its length (it's norm, v\| \boldsymbol{v} \| ). Let v\boldsymbol{v} denote the vector before normalization and v^\boldsymbol{\hat{v}} denote the normalized vector.

v^=vv\boldsymbol{\hat{v}} = \frac{\boldsymbol{v}}{ \| \boldsymbol{v} \| }

Using the Pythagorean theorem:

v=vx2+vy2+vz2\| \boldsymbol{v} \| = \sqrt{v_x^2 + v_y^2 + v_z^2}
v^=v1vx2+vy2+vz2\boldsymbol{\hat{v}} = \boldsymbol{v} \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}

We can then break this down into each component (x, y, and z) of the 3-dimensional vector:

vx^=vx1vx2+vy2+vz2\hat{v_x} = v_x \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}
vy^=vy1vx2+vy2+vz2\hat{v_y} = v_y \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}
vz^=vz1vx2+vy2+vz2\hat{v_z} = v_z \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}

The addition and multiplication are easy enough to execute, but both the division and the square root would be very slow on a Quake III-era CPU. Thus, we need a faster method to return an approximate value of 1x\frac{1}{\sqrt{x}}.

Link to this section IEEE 754: How Computers Store Fractions

It's pretty straightforward to store an integer in binary by just converting it into base 2:

510=0000010125_{10} = 0000 0101_2

But how do computers store fractions? One simple approach would be to put a decimal point at a fixed position in the number:

510=0101.000025_{10} = 0101 . 0000_2
5.2510=0101.010025.25_{10} = 0101 . 0100_2

This seems to work well, but we've drastically reduced the size of the numbers we can store. We only have half as many bits to store our integer part, which limits us to a relatively small range of numbers, and we only have half as many bits for our fractional part, which doesn't give us a ton of precision.

Instead, we can borrow the idea of scientific notation, which represents numbers with a mantissa and exponent like 5.431035.43 \cdot 10^3. With this, we can store very large values and very small values with a consistently low relative error. This is called "floating point"--unlike the prior idea, the decimal point can effectively "float around" in the bit representation to give us precision at both very small and very large values. Thus, if xx is the number we want to store, we write it in terms of exponent EE and mantissa MM:

x=M2Ex = M \cdot 2^E

At the time Quake III was released, most CPUs were 32-bit. Thus, the floating point representation used 32 bits to store a number. They were allocated as follows:

1 bit8 bits23 bits
signexponentmantissa

The "sign" bit is used to represent whether the number was positive or negative. In this case, since we know the argument to the square root function should always be positive, we can assume it to always be zero. There are also some special cases that happen for special values like NaN, inf, or very small numbers, which we can also ignore for now.

Note that the exponent could be positive or negative. To accommodate this, it is stored with an offset of 127. For example, to store 510=1.01225_{10} = 1.01 \cdot 2^2, we would take the exponent 210=1022_{10} = 10_2 and add 12710=011111112127_{10} = 0111 1111_2 to it to get 1000000121000 0001_2.

In scientific notation, the first digit of the mantissa must be between 1 and 9. In this "binary scientific notation", the first digit must be between 1 and 1. (You can see this in our example: 510=1.01225_{10} = 1.01 \cdot 2^2.) Therefore, we don't actually have to store it.

Link to this section Interpreting floats as ints

An integer is just represented as a base 2 number. So, what happens if we take a number, find the bits of its floating-point representation, and then interpret those bits as a base 2 integer number?

Recall that our float is stored as:

1 bit8 bits23 bits
signexponent representationmantissa representation

(Note that the least significant bit is on the right.)

Observe that the mantissa bits start in the ones place. However, remember that the mantissa is a fraction, so we've really stored (M1)223(M - 1) \cdot 2^{23}.

The exponent bits, E+127E + 127, start in the 2232^{23} place.

Essentially, we have the sum of the quantity (M1)223(M - 1) \cdot 2^{23} and the quantity (E+127)223(E + 127) \cdot 2^{23}.

Let's define the function I(x)I(x) that represents this bizarre operation of taking a floating-point representation and interpreting it as an integer:

I(x)=223(E+127+M1)I(x) = 2^{23} \cdot (E + 127 + M - 1)

Link to this section An observation about logarithms

Let's go back to our floating-point representation of xx:

x=M2Ex = M \cdot 2^E

Now, what happens if we take the logarithm? Let's take the log base 2, since we're working with binary. Evaluating log2(x)\log_2(x) directly would be pretty slow, but let's proceed symbolically to see if we get anywhere useful.

log2(x)=log2(M2E)\log_2(x) = \log_2(M \cdot 2^E)
log2(x)=log2(M)+log2(2E)\log_2(x) = \log_2(M) + \log_2(2^E)
log2(x)=log2(M)+E\log_2(x) = \log_2(M) + E

Calculating log2(M)\log_2(M) would be hard. Instead, we can make do with an approximation. Remember that since MM is the mantissa, it will be greater than 1 but less than 2.

There's a conveninent approximation we can use. Here's a graph in Desmos:

The red curve is log2(x)\log_2(x), the function we want to approximate. The purple curve is the line x1x - 1, which is already a pretty good approximation. However, the blue curve is even better: x1+σx - 1 + \sigma, where σ\sigma is tuned for the maximum accuracy around σ=0.0450466\sigma = 0.0450466.

Let's continue, using log2(x)x1+σ\log_2(x) \approx x - 1 + \sigma:

log2(x)=log2(M)+E\log_2(x) = \log_2(M) + E
log2(x)M1+σ+E\log_2(x) \approx M - 1 + \sigma + E

Link to this section But what does this have to do with I(x)I(x)?

Recall that

I(x)=223(E+127+M1)I(x) = 2^{23} \cdot (E + 127 + M - 1)

We proceed:

I(x)=223(M1+σ+E+127σ)I(x) = 2^{23} \cdot (M - 1 + \sigma + E + 127 - \sigma)

Observe the (M1+σ+E)(M - 1 + \sigma + E) part. Earlier, we found that log2(x)M1+σ+E\log_2(x) \approx M - 1 + \sigma + E. Therefore, we can substitute this in:

I(x)223(log2(x)+127σ)I(x) \approx 2^{23} \cdot (\log_2(x) + 127 - \sigma)

We can solve for log2(x)\log_2(x) as follows:

I(x)223log2(x)+127σ\frac{I(x)}{2^{23}} \approx \log_2(x) + 127 - \sigma
I(x)223127+σlog2(x)\frac{I(x)}{2^{23}} - 127 + \sigma \approx \log_2(x)
log2(x)I(x)223127+σ\log_2(x) \approx \frac{I(x)}{2^{23}} - 127 + \sigma

And there it is: we have a much faster way to compute the logarithm.

Link to this section Rewrite the problem

Remember, we're trying to find the quantity 1x\frac{1}{\sqrt{x}}.

We can use the properties of logarithms to find:

log21x=log2x12=12log2x\log_2{\frac{1}{\sqrt{x}}} = \log_2{x^{-\frac{1}{2}}} = -\frac{1}{2} \log_2{x}

Now, let's try substituting in our fast logarithm:

log21x=12log2x\log_2{\frac{1}{\sqrt{x}}} = -\frac{1}{2} \log_2{x}
I(1x)223127+σ12[I(x)223127+σ]\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - 127 + \sigma \approx -\frac{1}{2} \left[ \frac{I(x)}{2^{23}} - 127 + \sigma \right]

We can do some manipulation to solve for our result:

I(1x)223127+σ12[I(x)223127+σ]\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - 127 + \sigma \approx -\frac{1}{2} \left[ \frac{I(x)}{2^{23}} - 127 + \sigma \right]
I(1x)223(127σ)12I(x)223+12(127σ)\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - (127 - \sigma) \approx -\frac{1}{2} \cdot \frac{I(x)}{2^{23}} + \frac{1}{2} (127 - \sigma)
I(1x)223(127σ)12I(x)+12(223(127σ))I(\frac{1}{\sqrt{x}}) - 2^{23} (127 - \sigma) \approx -\frac{1}{2} I(x) + \frac{1}{2} (2^{23} (127 - \sigma))
I(1x)12I(x)+32(223(127σ))I(\frac{1}{\sqrt{x}}) \approx -\frac{1}{2} I(x) + \frac{3}{2} (2^{23} (127 - \sigma))
I(1x)32(223(127σ))12I(x)I(\frac{1}{\sqrt{x}}) \approx \frac{3}{2} (2^{23} (127 - \sigma)) - \frac{1}{2} I(x)
1xI1(32(223(127σ))12I(x))\frac{1}{\sqrt{x}} \approx I^{-1}(\frac{3}{2} (2^{23} (127 - \sigma)) - \frac{1}{2} I(x))

Link to this section The magic number

Let's look at this term:

32(223(127σ))\frac{3}{2} (2^{23} (127 - \sigma))

We know everything here ahead of time. Why don't we go through and calculate it?

σ=0.0450466\sigma = 0.0450466
32(223(1270.0450466))=1.5974630065963008109\frac{3}{2} (2^{23} (127 - 0.0450466)) = 1.5974630065963008 \cdot 10^9
1.597463006596300810915974630071.5974630065963008 \cdot 10^9 \approx 1597463007
159746300710=5F3759DF161597463007_{10} = 5F3759DF_{16}

So there's where that's from.

Anyway, we now have:

1xI1(5F3759DF1612I(x))\frac{1}{\sqrt{x}} \approx I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x))

This is the gist of how the function works. Let's step through the code now.

Link to this section The Code

Let's go through this code, line by line, to see how it matches up with our mathematical approximation.

float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}

A quick note: the argument to the function is called number, I'll be calling it xx for simplicity's sake.

Link to this section Evil Floating Point Bit Level Hacking

Let's look at the first part of the function.

float Q_rsqrt( float number )
{
long i;
float y;
...
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
...
}

We start by declaring a long, which is a 32-bit integer, called i. Then, we declare a float, or a floating-point representation number, y. We store the value of the argument (number, or xx) into y. Simple enough.

The next line, however, is where things get ugly. Starting from the right, let's go step-by-step.

y is, of course, our floating-point number.

&y refers to the reference to y--the location in computer memory at which y is stored. &y is a pointer to a floating-point number.

( long * ) is a cast--it converts a value from one type to another. Here, we're converting &y from "pointer to a floating-point number" to "pointer to a 32-bit integer". This doesn't modify the bits in y at all, it modifies the type of the pointer. It tells the compiler that the value at this pointer isn't a float, it's an int.

i = * [...] will dereference the pointer. It sets i equal to the value at that pointer. Since the pointer is considered a pointer to a 32-bit integer, and i is declared as a 32-bit integer, this just sets i equal to the bits at that location in memory.

Effectively, this part takes the bits in the floating-point representation of the argument (number) and interprets them as a 32-bit integer instead.

Does that sound familiar? It's our I(x)I(x) function we defined earlier! These lines could be written as

yxy \gets x
iI(y)i \gets I(y)

Or, more concisely:

iI(x)i \gets I(x)

Link to this section What's with all this memory trickery?

You might ask, why can't we just do this?

i = (long) y;

After all, we just want y as an integer, right?

However, this expression will actually convert the value that y represents into an integer. For instance, if y=5.2510=1.010122y = 5.25_{10} = 1.0101 \cdot 2^2, then this code will set i=5i = 5. It will convert y to an integer.

This isn't what we want. We don't want to do any conversion -- we are literally taking the bit representation of y and interpreting it as an integer instead. Thus, we convert the pointer instead, which doesn't actually modify the bits stored in memory.

You might think that the code looks ugly. That's because it is--casting a pointer from one type to another is considered "undefined behavior," and the C standard does not guarantee what will happen. We're basically tricking the compiler here. This is definitely considered bad practice, which is why it's "Evil Floating Point Bit Level Hacking". Evil because it's relying on undefined behavior, Floating Point because we're working with a floating-point representation, Bit Level because we're directly using the bit representation as an integer, and Hacking because this is most definitely not the way casting is intended to be used.

Link to this section The "WTF" Line

float Q_rsqrt( float number )
{
...
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
...
}

If you aren't familiar with bitwise operations, the symbol ( i >> 1 ) might seem strange to you.

Think about doing a division problem like 13560÷313560 \div 3. It would be pretty tough, right? You'd probably need to write out the entire long division problem.

On the other hand, think about finding 13560÷1013560 \div 10. You can pretty quickly tell that the result is 13561356, right? All you had to do was shift all the digits one place to the right.

We can use the same trick in binary. To divide a number by 2, we just need to shift each bit to the right. That is what >> 1 does--it's just a much faster way to divide by two.

Remember, in the last step, we set iI(x)i \gets I(x). Thus, ( i >> 1 ) is 12I(x)\frac{1}{2} I(x).

We then subtract this from our magic number:

i5F3759DF1612I(x)i \gets 5F3759DF_{16} - \frac{1}{2} I(x)

Link to this section Finishing up

float Q_rsqrt( float number )
{
...
y = * ( float * ) &i;
...
}

This looks very similar to the i = * ( long * ) &y; line we looked at earlier, and that's because it is. However, instead of interpreting the floating-point representation y as an integer, we're interpreting the integer i as a floating-point representation. You can think of this as our I1(x)I^{-1}(x) function.

This step performs yI1(i)y \gets I^{-1}(i). Since the last step set i5F3759DF1612I(x)i \gets 5F3759DF_{16} - \frac{1}{2} I(x), this effectively sets:

yI1(5F3759DF1612I(x))y \gets I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x))

There we go!

Link to this section But wait, there's more: Newton's method

float Q_rsqrt( float number )
{
...
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
...
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
...
}

What does this do?

This line performs "Newton's method", which is a method of refining an approximation for the root of a function.

Let's define f(t)=1t2xf(t) = \frac{1}{t^2} - x. Notice that when t=1xt = \frac{1}{\sqrt{x}}, f(t)=0f(t) = 0. Therefore, we can use a root-finding algorithm to try to find this root of f(t)f(t), and we'll get back a better approximation for t=1xt = \frac{1}{\sqrt{x}}.

Here's a graph that shows how Newton's method works:

Note: since we're working with both arguments to Q_rsqrt(x)Q\_rsqrt(x) and arguments to f(t)f(t), I've decided to stick with using tt for the latter. Generally, when working with Newton's method, this value would be called xx.

We have our function in red, and an initial guess in dotted black, and we're trying to find the point at which the function crosses the t-axis. We can draw a line tangent to the function at our initial guess (in green) and then find the intersection of that line with the t-axis to get an even better guess. We can keep doing this until we're happy with the precision.

So, we have our initial guess given by our I1(5F3759DF1612I(x)I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x) expression. Let's call this guess t0t_0. How do we draw a tangent line?

Remember, the point-slope form for a line is given by y=m(tt0)+y0y = m(t - t_0) + y_0. Thus, we can just plug in our initial point (t0,f(t0))(t_0, f(t_0)) to get y=m(tt0)+f(t0)y = m(t - t_0) + f(t_0).

To get the slope mm, we need to take the derivative of the function f(t)f(t). Let's do this later--just call it f(t)f'(t) for now.

y=f(t)(tt0)+f(t0)y = f'(t) \cdot (t - t_0) + f(t_0)

We're trying to find the point where this tangent line crosses the t-axis, that is, where y=0y = 0. Substitute 00 for yy:

0=f(t)(tt0)+f(t0)0 = f'(t) \cdot (t - t_0) + f(t_0)
f(t0)=f(t)(tt0)f(t_0) = - f'(t) \cdot (t - t_0)
f(t0)f(t0)=tt0-\frac{f(t_0)}{f'(t_0)} = t - t_0
t=t0f(t0)f(t0)t = t_0 - \frac{f(t_0)}{f'(t_0)}

And we've arrived at the formula for Newton's method.

Let's substitute in our f(t)f(t) and f(t)f'(t). First, we need to find the derivative:

f(t)=ddt[1t2x]=ddtt2=2t3f'(t) = \frac{d}{dt} \Big[ \frac{1}{t^2} - x \Big] = \frac{d}{dt} t^{-2} = -2t^{-3}

Now, proceed:

t=t0f(t0)f(t0)t = t_0 - \frac{f(t_0)}{f'(t_0)}
t=t01t02x2t03t = t_0 - \frac{\frac{1}{t_0^2} - x}{-2t_0^{-3}}
t=t0(1t02x)t032t = t_0 - \frac{(\frac{1}{t_0^2} - x)t_0^3}{-2}
t=t0+t0t03x2t = t_0 + \frac{t_0 - t_0^3x}{2}
t=2t0+t0t03x2t = \frac{2t_0 + t_0 - t_0^3x}{2}
t=3t0t03x2t = \frac{3t_0 - t_0^3x}{2}
t=t0(3t02x)2t = \frac{t_0(3 - t_0^2x)}{2}
t=t03t02x2t = t_0 \cdot \frac{3 - t_0^2x}{2}
t=t0(32t02x2)t = t_0 ( \frac{3}{2} - \frac{t_0^2x}{2} )
t=t0(32x2t02)t = t_0 ( \frac{3}{2} - \frac{x}{2} t_0^2 )

Let's look at that line of code again:

float Q_rsqrt( float number )
{
...
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
...
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
...
}

Earlier in the code, there is a line that performs x2x0.5x_2 \gets x \cdot 0.5. This is just so that x2\frac{x}{2} doesn't have to be recalculated on each iteration (even though the second iteration was since removed). Note that we multiply by 0.5 instead of dividing by 2 because multiplication is faster than division. Also, we can't do the bit-shifting trick here since xx is a floating-point number, not an integer.

The Newton's iteration lines do:

yy(32(x2yy))y \gets y (\frac{3}{2} - ( x_2 \cdot y \cdot y ))

which is equivalent to

yy(32x2y2)y \gets y (\frac{3}{2} - \frac{x}{2} y^2 )

which is exactly the expression for Newton's method we found earlier.

This line of code can be repeated to get better and better approximations. However, it appears the authors of Quake III decided that only one iteration was necessary, since the second one was removed.

Link to this section The End

float Q_rsqrt( float number )
{
...
return y;
}

I'll end with a quote from a relevant xkcd:

Some engineer out there has solved P=NP and it's locked up in an electric eggbeater calibration routine. For every 0x5f375a86 we learn about, there are thousands we never see.

(It looks like the constant Randall mentioned was based on σ=0.0430357\sigma = 0.0430357, not σ=0.0450466\sigma = 0.0450466.)