Here’s a famous function in the Quake III source code. Can you guess what it does?

This function calculates the inverse of the square root of a number.

\[Q\_rsqrt(number) = \frac{1}{\sqrt{number}}\]Quake III needs to perform a lot of calculations to simulate lighting. Each face of a polygon in the scene is represented by a 3D vector pointing perpendicular to it. Then, these vectors are used to calculate the reflections of light off of the face.

Each face could be represented by a vector of any length–the important information is the direction of the vector, not the length of the vector. However, the calculations become much more simple if all of the vectors have length 1. Thus, we need to *normalize* these vectors. We can do this by dividing the vector by its length (it’s *norm*, \(\| \boldsymbol{v} \|\)). Let \(\boldsymbol{v}\) denote the vector before normalization and \(\boldsymbol{\hat{v}}\) denote the normalized vector.

Using the Pythagorean theorem:

\[\| \boldsymbol{v} \| = \sqrt{v_x^2 + v_y^2 + v_z^2}\] \[\boldsymbol{\hat{v}} = \boldsymbol{v} \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}\]We can then break this down into each component (x, y, and z) of the 3-dimensional vector:

\[\hat{v_x} = v_x \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}\] \[\hat{v_y} = v_y \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}\] \[\hat{v_z} = v_z \cdot \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}}\]The addition and multiplication are easy enough to execute, but both the division and the square root would be very slow on a Quake III-era CPU. Thus, we need a faster method to return an approximate value of \(\frac{1}{\sqrt{x}}\).

It’s pretty straightforward to store an integer in binary by just converting it into base 2:

\[5_{10} = 0000 0101_2\]But how do computers store fractions? One simple approach would be to put a decimal point at a fixed position in the number:

\[5_{10} = 0101 . 0000_2\] \[5.25_{10} = 0101 . 0100_2\]This seems to work well, but we’ve drastically reduced the size of the numbers we can store. We only have half as many bits to store our integer part, which limits us to a relatively small range of numbers, and we only have half as many bits for our fractional part, which doesn’t give us a ton of precision.

Instead, we can borrow the idea of scientific notation, which represents numbers with a mantissa and exponent like \(5.43 \cdot 10^3\). With this, we can store very large values and very small values with a consistently low relative error. This is called “floating point”–unlike the prior idea, the decimal point can effectively “float around” in the bit representation to give us precision at both very small and very large values. Thus, if \(x\) is the number we want to store, we write it in terms of exponent \(E\) and mantissa \(M\):

\[x = M \cdot 2^E\]At the time Quake III was released, most CPUs were 32-bit. Thus, the floating point representation used 32 bits to store a number. They were allocated as follows:

1 bit | 8 bits | 23 bits |
---|---|---|

sign | exponent | mantissa |

The “sign” bit is used to represent whether the number was positive or negative. In this case, since we know the argument to the square root function should always be positive, we can assume it to always be zero. There are also some special cases that happen for special values like `NaN`

, `inf`

, or very small numbers, which we can also ignore for now.

Note that the exponent could be positive or negative. To accomodate this, it is stored with an offset of 127. For example, to store \(5_{10} = 1.01 \cdot 2^2\), we would take the exponent \(2_{10} = 10_2\) and add \(127_{10} = 0111 1111_2\) to it to get \(1000 0001_2\).

In scientific notation, the first digit of the mantissa must be between 1 and 9. In this “binary scientific notation”, the first digit must be between 1 and 1. (You can see this in our example: \(5_{10} = 1.01 \cdot 2^2\).) Therefore, we don’t actually have to store it.

An integer is just represented as a base 2 number. So, what happens if we take a number, find the bits of its floating-point representation, and then interpret those bits as a base 2 integer number?

Recall that our float is stored as:

1 bit | 8 bits | 23 bits |
---|---|---|

sign | exponent representation | mantissa representation |

(Note that the least significant bit is on the right.)

Observe that the mantissa bits start in the ones place. However, remember that the mantissa is a fraction, so we’ve really stored \((M - 1) \cdot 2^{23}\).

The exponent bits, \(E + 127\), start in the \(2^{23}\) place.

Essentially, we have the sum of the quantity \((M - 1) \cdot 2^{23}\) and the quantity \((E + 127) \cdot 2^{23}\).

Let’s define the function \(I(x)\) that represents this bizarre operation of taking a floating-point representation and interpreting it as an integer:

\[I(x) = 2^{23} \cdot (E + 127 + M - 10)\]Let’s go back to our floating-point representation of \(x\):

\[x = M \cdot 2^E\]Now, what happens if we take the logarithm? Let’s take the log base 2, since we’re working with binary. Evaluating \(log_2(x)\) directly would be pretty slow, but let’s proceed symbolically to see if we get anywhere useful.

\[log_2(x) = log_2(M \cdot 2^E)\] \[log_2(x) = log_2(M) + log_2(2^E)\] \[log_2(x) = log_2(M) + E\]Calculating \(log_2(M)\) would be hard. Instead, we can make do with an approximation. Remember that since \(M\) is the mantissa, it will be greater than 1 but less than 2.

There’s a convinent approximation we can use. Here’s a graph in Desmos:

The red curve is \(log_2(x)\), the function we want to approximate. The purple curve is the line \(x - 1\), which is already a pretty good approximation. However, the blue curve is even better: \(x - 1 + \sigma\), where \(\sigma\) is tuned for the maximum accuracy around \(\sigma = 0.0450466\).

Let’s continue, using \(log_2(x) \approx x - 1 + \sigma\):

\[log_2(x) = log_2(M) + E\] \[log_2(x) \approx M - 1 + \sigma + E\]Recall that

\[I(x) = 2^{23} \cdot (E + 127 + M - 1)\]We proceed:

\[I(x) = 2^{23} \cdot (M - 1 + \sigma + E + 127 - \sigma)\]Observe the \((M - 1 + \sigma + E)\) part. Earlier, we found that \(log_2(x) \approx M - 1 + \sigma + E\). Therefore, we can substitute this in:

\[I(x) \approx 2^{23} \cdot (log_2(x) + 127 - \sigma)\]We can solve for \(log_2(x)\) as follows:

\[\frac{I(x)}{2^{23}} \approx log_2(x) + 127 - \sigma\] \[\frac{I(x)}{2^{23}} - 127 + \sigma \approx log_2(x)\] \[log_2(x) \approx \frac{I(x)}{2^{23}} - 127 + \sigma\]And there it is: we have a much faster way to compute the logarithm.

Remember, we’re trying to find the quantity \(\frac{1}{\sqrt{x}}\).

We can use the properties of logarithms to find:

\[log_2{\frac{1}{\sqrt{x}}} = log_2{x^{-\frac{1}{2}}} = -\frac{1}{2} log_2{x}\]Now, let’s try substituting in our fast logarithm:

\[log_2{\frac{1}{\sqrt{x}}} = -\frac{1}{2} log_2{x}\] \[\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - 127 + \sigma \approx -\frac{1}{2} (\frac{I(x)}{2^{23}} - 127 + \sigma)\]We can do some manipulation to solve for our result:

\[\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - 127 + \sigma \approx -\frac{1}{2} (\frac{I(x)}{2^{23}} - 127 + \sigma)\] \[\frac{I(\frac{1}{\sqrt{x}})}{2^{23}} - (127 - \sigma) \approx -\frac{1}{2} \cdot \frac{I(x)}{2^{23}} + \frac{1}{2} (127 - \sigma)\] \[I(\frac{1}{\sqrt{x}}) - 2^{23} (127 - \sigma) \approx -\frac{1}{2} I(x) + \frac{1}{2} (2^{23} (127 - \sigma))\] \[I(\frac{1}{\sqrt{x}}) \approx -\frac{1}{2} I(x) + \frac{3}{2} (2^{23} (127 - \sigma))\] \[I(\frac{1}{\sqrt{x}}) \approx \frac{3}{2} (2^{23} (127 - \sigma)) - \frac{1}{2} I(x)\] \[\frac{1}{\sqrt{x}} \approx I^{-1}(\frac{3}{2} (2^{23} (127 - \sigma)) - \frac{1}{2} I(x))\]Let’s look at this term:

\[\frac{3}{2} (2^{23} (127 - \sigma))\]We know everything here ahead of time. Why don’t we go through and calculate it?

\[\sigma = 0.0450466\] \[\frac{3}{2} (2^{23} (127 - 0.0450466)) = 1.5974630065963008 \cdot 10^9\] \[1.5974630065963008 \cdot 10^9 \approx 1597463007\] \[1597463007_{10} = 5F3759DF_{16}\]So *there’s* where that’s from.

Anyway, we now have:

\[\frac{1}{\sqrt{x}} \approx I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x))\]This is the gist of how the function works. Let’s step through the code now.

Let’s go through this code, line by line, to see how it matches up with our mathematical approximation.

A quick note: the argument to the function is called `number`

, I’ll be calling it \(x\) for simplicity’s sake.

Let’s look at the first part of the function.

We start by declaring a `long`

, which is a 32-bit integer, called `i`

. Then, we declare a `float`

, or a floating-point representation number, `y`

. We store the value of the argument (`number`

, or \(x\)) into `y`

. Simple enough.

The next line, however, is where things get ugly. Starting from the right, let’s go step-by-step.

`y`

is, of course, our floating-point number.

`&y`

refers to the *reference* to `y`

–the location in computer memory at which `y`

is stored. `&y`

is a *pointer* to a floating-point number.

`( long * )`

is a *cast*–it converts a value from one type to another. Here, we’re converting `&y`

from “pointer to a floating-point number” to “pointer to a 32-bit integer”. This doesn’t modify the bits in `y`

at all, it modifies the *type* of the pointer. It tells C++ that the *value* at this pointer isn’t a float, it’s an int.

`i = * [...]`

will *dereference* the pointer. It sets `i`

equal to the value at that pointer. Since the pointer is considered a pointer to a 32-bit integer, and `i`

is declared as a 32-bit integer, this just sets `i`

equal to the bits at that location in memory.

Effectively, this part takes the bits in the floating-point representation of the argument (`number`

) and interprets them as a 32-bit integer instead.

Does that sound familiar? It’s our \(I(x)\) function we defined earlier! These lines could be written as

\[y \gets x\] \[i \gets I(y) = I(x)\]You might ask, why can’t we just do this?

After all, we just want `y`

as an integer, right?

However, this expression will actually convert the *value* that `y`

represents into an integer. For instance, if \(y = 5.25_{10} = 1.0101 \cdot 2^2\), then this code will set \(i = 5\). It will *convert* `y`

to an integer.

This isn’t what we want. We don’t want to do any conversion – we are literally taking the bit representation of `y`

and interpreting it as an integer instead. Thus, we convert the *pointer* instead, which doesn’t actually modify the bits stored in memory.

You might think that the code looks ugly. That’s because it is–casting a pointer from one type to another is considered “undefined behavior,” and the C++ standard does not guarantee what will happen. We’re basically tricking C++ here. This is definitely considered bad practice, which is why it’s *“Evil Floating Point Bit Level Hacking”*. *Evil* because it’s relying on undefined behavior, *Floating Point* because we’re working with a floating-point representation, *Bit Level* because we’re directly using the bit representation as an integer, and *Hacking* because this is most definitely not the way casting is intended to be used.

If you aren’t familiar with bitwise operations, the symbol `( i >> 1 )`

might seem strange to you.

Think about doing a division problem like \(13560 \div 3\). It would be pretty tough, right? You’d probably need to write out the entire long division problem.

On the other hand, think about finding \(13560 \div 10\). You can pretty quickly tell that the result is \(1356\), right? All you had to do was shift all the digits one place to the right.

We can use the same trick in binary. To divide a number by 2, we just need to shift each bit to the right. That is what `>> 1`

does–it’s just a much faster way to divide by two.

Remember, in the last step, we set \(i \gets I(x)\). Thus, `( i >> 1 )`

is \(\frac{1}{2} I(x)\).

We then subtract this from our magic number:

\[i \gets 5F3759DF_{16} - \frac{1}{2} I(x)\]This looks very similar to the `i = * ( long * ) &y;`

line we looked at earlier, and that’s because it is. However, instead of interpreting the floating-point representation `y`

as an integer, we’re interpreting the integer `i`

as a floating-point representation. You can think of this as our \(I^{-1}(x)\) function.

This step performs \(y \gets I^{-1}(i)\). Since the last step set \(i \gets 5F3759DF_{16} - \frac{1}{2} I(x)\), this effectively sets:

\[y \gets I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x))\]There we go!

What does this do?

This line performs “Newton’s method”, which is a method of refining an approximation for the root of a function.

Let’s define \(f(t) = \frac{1}{t^2} - x\). Notice that when \(t = \frac{1}{\sqrt{x}}\), \(f(t) = 0\). Therefore, we can use a root-finding algorithm to try to find this root of \(f(t)\), and we’ll get back a better approximation for \(t = \frac{1}{\sqrt{x}}\).

Here’s a graph that shows how Newton’s method works:

Note: since we’re working with both arguments to \(Q\_rsqrt(x)\) and arguments to \(f(t)\), I’ve decided to stick with using \(t\) for the latter. Generally, when working with Newton’s method, this value would be called \(x\).

We have our function in red, and an initial guess in dotted black, and we’re trying to find the point at which the function crosses the t-axis. We can draw a line tangent to the function at our initial guess (in green) and then find the intersection of that line with the t-axis to get an even better guess. We can keep doing this until we’re happy with the precision.

So, we have our initial guess given by our \(I^{-1}(5F3759DF_{16} - \frac{1}{2} I(x)\) expression. Let’s call this guess \(t_0\). How do we draw a tangent line?

Remember, the point-slope form for a line is given by \(y = m(t - t_0) + y_0\). Thus, we can just plug in our initial point \((t_0, f(t_0))\) to get \(y = m(t - t_0) + f(t_0)\).

To get the slope \(m\), we need to take the derivative of the function \(f(t)\). Let’s do this later–just call it \(f'(t)\) for now.

\[y = f'(t) \cdot (t - t_0) + f(t_0)\]We’re trying to find the point where this tangent line crosses the t-axis, that is, where \(y = 0\). Substitute \(0\) for \(y\):

\[0 = f'(t) \cdot (t - t_0) + f(t_0)\] \[f(t_0) = - f'(t) \cdot (t - t_0)\] \[-\frac{f(t_0)}{f'(t_0)} = t - t_0\] \[t = t_0 - \frac{f(t_0)}{f'(t_0)}\]And we’ve arrived at the formula for Newton’s method.

Let’s substitute in our \(f(t)\) and \(f'(t)\). First, we need to find the derivative:

\[f'(t) = \frac{d}{dt} [\frac{1}{t^2} - x] = \frac{d}{dt} t^{-2} = -2t^{-3}\]Now, proceed:

\[t = t_0 - \frac{f(t_0)}{f'(t_0)}\] \[t = t_0 - \frac{\frac{1}{t_0^2} - x}{-2t_0^{-3}}\] \[t = t_0 - \frac{(\frac{1}{t_0^2} - x)t_0^3}{-2}\] \[t = t_0 + \frac{t_0 - t_0^3x}{2}\] \[t = \frac{2t_0 + t_0 - t_0^3x}{2}\] \[t = \frac{3t_0 - t_0^3x}{2}\] \[t = \frac{t_0(3 - t_0^2x)}{2}\] \[t = t_0 \cdot \frac{3 - t_0^2x}{2}\] \[t = t_0 ( \frac{3}{2} - \frac{t_0^2x}{2} )\] \[t = t_0 ( \frac{3}{2} - \frac{x}{2} t_0^2 )\]Let’s look at that line of code again:

Earlier in the code, there is a line that performs \(x_2 \gets x \cdot 0.5\). This is just so that \(\frac{x}{2}\) doesn’t have to be recalculated on each iteration (even though the second iteration was since removed). Note that we multiply by 0.5 instead of dividing by 2 because multiplication is faster than division. Also, we can’t do the bit-shifting trick here since \(x\) is a floating-point number, not an integer.

The Newton’s iteration lines do:

\[y \gets y (\frac{3}{2} - ( x_2 \cdot y \cdot y ))\]which is equivalent to

\[y \gets y (\frac{3}{2} - \frac{x}{2} y^2 )\]which is exactly the expression for Newton’s method we found earlier.

This line of code can be repeated to get better and better approximations. However, it appears the authors of Quake III decided that only one iteration was necessary, since the second one was removed.

I’ll end with a quote from a relevant xkcd:

Some engineer out there has solved P=NP and it’s locked up in an electric eggbeater calibration routine. For every 0x5f375a86 we learn about, there are thousands we never see.

(It looks like the constant Randall mentioned was based on \(\sigma = 0.0430357\), not \(\sigma = 0.0450466\).)