Understanding entropy | Alberto Perdomo

If you log in into a Linux machine and run this command:

$ cat /proc/sys/kernel/random/entropy_avail

you will get a number that indicates how much "entropy" is available to the kernel of the system. But what exactly is this "entropy"? What unit is this entropy measured in? What is it used for? You may have heard that a low number of entropy is not good. How low is "low" and what "bad" things will happen if it is? What's a good range for it to be at? How is it determined?

entropy representation

Entropy is similar to "randomness". A Linux system gathers "real" random numbers by keeping an eye on different events: network activity, hard drive rotation speeds, hardware random number generator (if available), key-clicks, and so on. If feeds those to the kernel entropy pool, which is used by /dev/random. Applications which use crypto functions, use /dev/random as their entropy source, or in other words, the randomness source.

If /dev/random runs out of available entropy, it's unable to serve out more randomness and the application waiting for the randomness may stall until more random bits are available. On Red Hat Enterprise Linux systems, you can see that RPM package rng-tools is installed, and that a rngd - random nubmer generator deamon - is active. This deamon feeds semi-random numbers from /dev/urandom to /dev/random in case /dev/random runs out of "real" entropy. On Ubuntu based systems, there is no rngd. If more entropy is needed you can install haveged, which can achieve the same. Note that haveged is not available for Red Hat Enterprise Linux based systems, because these systems already have rngd.

Some applications (such as applications using encryption) need random numbers, and therefore a Random Number Generator (RNG). You can generate random numbers using an algorithm - but although these seem random in one sense they are totally predictable in another. For instance if I give you the digits 582097494459230781640628620, they look pretty random. But if you realize they are actually digits of Pi, then you may know the next one is going to be 8.

For some applications this is okay, but for other applications (especially security related ones) people want genuine unpredictable randomness - which can't be generated by an algorithm (i.e. program), since that is by definition predictable. This is a problem in that your computer essentially is a program, so how can it possibly get genuine random numbers? The answer is by measuring genuinely random events from the outside world - for example gaps between your key strokes and using these to inject genuine randomness into the otherwise predictable random number generator. The "entropy pool" could be thought of as the store of this randomness which gets built up by the keystrokes (or whatever is being used) and drained by the generation of random numbers.

The value stored in /proc/sys/kernel/random/entropy_avail is the measure of bits currently available to be read from /dev/random. It takes time for the computer to read entropy from its environment. If you have 4096 bits of entropy available and you $cat /dev/random you can expect to be able to read 512 bytes of entropy (4096 bits) before the file blocks while it waits for more entropy. For example, if you $cat /dev/random your entropy will shrink to zero. At first you'll get 512 bytes of random garbage, but it will stop and little by little you'll see more random data trickle trough.

This is not how people should operate /dev/random though. Normally developers will read a small amount of data, like 128 bits, and use that to seed some kind of PRNG algorithm. It's advised to not read any more entropy from /dev/random than you need to, since it takes so long to build up entropy, and it is considered valuable. Thus if you drain it by carelessly catting the file like above, you'll cause other applications that need to read from /dev/random to block.

Finally, just for the sake of having some guidelines, a good number of available entropy is usually between 2500 and 4096 bits, whereas entropy is considered to be low when it's below 1000.