 On Information Entropy August 16, 2010 - 3 Comments

In this blog post you will learn about entropy within the domain of information theory. You will learn what entropy is and how to compute it. You will be shown some simple C code snippets to bring theory into application. Also, you will be shown why it is an important measurement in the field of computer security. Finally, we will cover some practical applications of entropy calculation and analysis.

Introduction

Entropy is a measurement of uncertainty. It is a way to assign a score of uncertainty to a stochastic variable. Entropy is a key component of information theory, a branch of mathematics designed to quantify information. It first came of age in 1948 with the publication of Claude Shannon’s paper “A Mathematical Theory of Communication.”

Before we can adequately define entropy, we need to first define information. Information, for our purposes, is what you get when the uncertainty about something is diminished or erased. Take a fair coin. Before we flip the coin, we are uncertain about what will happen when we flip it. After we flip the coin, our uncertainty about that coin flip is effectively 0 since we know the outcome (heads or tails). We’ve gained information. But how do we measure the uncertainty about the coin flip before it happens? How do we quantify that uncertainty? The answer is entropy.

Consider an information source X that can produce, with equal probability, 6 symbols (also known as outcomes): A, B, C, D, E, and F. Given that we know the probability of each outcome (1/6) and the alphabet size (6), information theory and entropy enable us to confidently calculate the uncertainty associated with that information source (in this example, as we’ll see, it’s about 2.6 bits). It is important to make the distinction that entropy is not information. Entropy, which is typically measured in bits, is actually a measurement of how much information is NOT available when we don’t know the outcome of an information source.

Entropy gives us a way to quantify the average number of bits needed to convey a message when considering all possible outcomes. For example, the entropy of a coin flip with 2 equally likely outcomes is less than that of the roll of die with 6 equally likely outcomes. This will be explained in further detail.

Nomenclature Clarification

According to Internet lore, at the time of discovery, Shannon didn’t know what to call his new unit of measure and so he asked the famous mathematician John von Neumman. Von Neumann told him he should call it the “entropy” because nobody really knows what entropy is (indeed it does appear to be a loaded term) and thus Shannon would have an advantage when debating his burgeoning field with pundits. Hitherto this has led to much confusion since the term entropy also plays a central role in many areas of science, including thermodynamics, heat, and energy. To be technically correct, this article is about Shannon Entropy, shortened here as “entropy”.

Pith

Now that we have a slightly better than comic book understanding of what entropy is, let’s look at how to calculate it. Shannon’s classic formula for computing entropy is shown below. The entropy H of a random variable X with possible outcomes of {X1Xn} is the product of the probability of outcome i times the base 2 logarithm of one over the probability of outcome i. The Greek letter Sigma (∑) indicates summation notation, and as such, the formula is repeated for every possible outcome of i and summed. The overall result is the entropy measured in bits. The base 2 logarithm (also known as the binary logarithm) is most often used to compute entropy because of its close connection with the binary numbering system, and this has become the standard unit of measure when discussing information entropy.

A Common Information Source

Consider our information source from above with an alphabet size of 6 equally probable symbols. This source is commonly found in many different games in many different cultures, typically referred to as a die. We can easily compute the entropy of any fair, untampered die. It has 6 possible outcomes, each one as probable as the next. 1/6 is expressed decimally as the repeating number .1666… The entropy formula is applied and calculated as per the following: The roll of a die has approximately 2.6 bits of entropy. Let’s cogitate on that. Recall that entropy gives us a way to quantify the average number of bits needed to convey a message. Therefore, we know that, on average, we would need around 2.6 bits to represent all possible messages from this information source. Let’s look at that graphically: We can see that 2 bits is not enough to represent all of the possible messages and 3 bits is just a little too much. As you can see from the graphic, 2.6 bits is approximately in the middle of these two, which is exactly what we expect. Entropy tells us that we can encode all the information a die has to give us in just under 3 bits.

Coin Flips and Beyond

A coin flip has 1 bit of entropy. You can encode all possible outcomes using a single bit of information. This makes sense, even without math. It’s either heads or tails, 1 or 0. Considering the example of the roll of the die, which has about 2.6 bits of entropy, and the flip of a coin, which has 1 bit of entropy, we can see that as the number of equally possible outcomes grows, so does the entropy. This is intuitive, the larger number of possible outcomes means that, on average, as our uncertainty grows, we will need more information to be able to describe it.

A C Program to Calculate File Entropy

Once we understand the formula, it’s quite easy to build a short C program to calculate entropy. Our example program will calculate the entropy across an entire file. In this section, some code snippets from a standalone entropy calculation tool “doe” (determine overall entropy) are examined.

In the first piece of code, a buffer is filled from a previously opened file (code not shown). We can see that this will repeat as many times as necessary until the entire file is read.

/* read a brick of data */
while ((i = read(fd, buf, sizeof (buf))))
{
if (i == -1)
{
fprintf(stderr, "Can't read %s (%s)\n", file, strerror(errno));
close(fd);
break;
}

In the next piece of code, a frequency array, an unsigned integer array with a length of 256, is populated. The program steps through the file buffer and records the number of times it sees each byte (an outcome).

for (k = 0; k < i; k++)
{
/**
*  For each brick of data we siphon from the file, record
*  the number of instances of each observed value into
*  the frequency array.
*/
fa[buf[k]]++;
c++;
}
}
close(fd);

In the final piece, we compute the entropy across the frequency array. Recall that we need to turn the frequency distribution of each outcome into a probability distribution. This is done by dividing the frequency with which each symbol occurs with the total size of the sample set (the number of bytes in our file). After we have a symbol’s probability, we can compute the entropy for that outcome and add it to the accumulator. The final result is the overall entropy for the entire file, measured in bits.

for (k = 0; k < ALPHABETSIZE; k++)
{
/** ignore empty slots */
if (fa[k])
{
/** convert frequency array entry -> probability distribution
prob = (double)fa[k] / (double)c;
/** the famous Shannon Entropy formula */
entropy += prob * log2(1/prob);
}
}

Using doe, the following files’ entropy is computed:

1. Worm.Padobot.M: The malware known as Padobot.M. This is a Windows-based PE32 executable file that is not packed or compressed.
2. an-encrypted-file.enc: A random text file that is encrypted using AES.
3. e45bf45464d0f5e041f8876e77472e3b.MD5: A trojan downloader malware; it has no common name so its MD5 hash is used. This is a Windows-based PE32 executable file. It is packed using ASPack.
4. webster-dictionary.txt: This is a plaintext file filled with 311,141 English words from Webster’s dictionary.
5. wordpad.exe: The wordpad executable from a standard Windows Vista install. This is a Windows-based PE32 executable file that is not packed or compressed.
[snarkbox:~/Projects/Entropy/doe] mike% ./doe examples/*
examples/Worm.Padobot.M                       : 5.97 bits/byte
examples/an-encrypted-file.enc                : 7.88 bits/byte
examples/e45bf45464d0f5e041f8876e77472e3b.MD5 : 7.87 bits/byte
examples/webster-dictionary.txt               : 4.42 bits/byte
examples/wordpad.exe                          : 6.55 bits/byte

We can now empirically verify what we’ve learned above. The files with the most uniformity and redundancy will have lower entropy scores. English text and uncompressed executables tend to have a lot of predictable and repeatable sequences that bring down their uncertainty and the average number of bits needed to represent them, and therefore result in low entropy scores.

As expected, the encrypted and packed files have the highest entropy scores. This is because of the high level of unpredictability or randomness throughout the file. An entropy score that is asymptotic to 8 bits per byte means that, on average, the possible outcome of each symbol is equally as likely as any other. A theoretically perfectly random one-time pad would have the highest entropy.

Practical Applications

So far we’ve learned what entropy is and how to calculate it. In addition to it being the lynchpin of Information Theory, entropy does indeed have practical applications and is quite important in the field of computer security. Let’s spend some time exploring a few areas where entropy has direct value.

Quality of Randomness

As we’ve learned, entropy is a measure of uncertainty. It can also be looked at as a measure of randomness. Cryptography requires randomness and the more a given cryptosystem has to work with, the more secure it is said to be. Schneier asserts the entropy of a cryptosystem is a measure of the size of the keyspace. This is intuitively cogent when you consider the generalization that a cryptosystem gets more secure as the number of possible keys used to encrypt it grows larger. But how we can also use entropy calculations is to determine the quality of values such as key material, salts, IVs, and nonces. It’s important that these values, along with the means used to generate them, be as unpredictable as possible. Therefore, we can use what we’ve learned here to determine the “quality of randomness” for cryptographic material.

Detecting Packed Executables

Packers are an obfuscation and evasion tool used by malware authors to add polymorphism, armoring, metamorphism, EPO, and a host of other techniques aimed at evading antivirus scanners. They compress and encrypt malware such that the original executable is then completely unrecognizable. Entropy scanning can be an effective tool to help detect such files. Wu, Chiueh and Zhao describe an entropy-based scanning technique in their paper “Efficient and Automatic Instrumentation for Packed Binaries,” which details a tool called “Uncover” that performs static entropy calculations combined with some runtime analysis. Lyda and Hamrock also discuss another entropy-based solution in their paper “Using Entropy Analysis to Find Encrypted and Packed Malware“.

Finding Cryptographic Material

You can also use entropy scanning to look for cryptographic keys, messages, or seeds in otherwise normal data streams. One such tool, written in Python by Ero Carrera, looks for areas of high entropy within files to spot cryptographic material. Obviously this will only work if the file scanned is uncompressed and unencrypted.

Network Anomaly Detection

Another novel use of entropy scanning lies in detecting when it’s not there, or more specifically, lower than expected. Consider an encrypted channel such as an SSH tunnel or an HTTPS network connection. After connection setup and key exchange, as we’ve seen, the entropy of such an encrypted channel should hover around 8 bits per byte. However, when misuse occurs, such as the OpenSSL attack, OpenSSH attack, or unencrypted covert channel usage, the entropy will dip below 8. The Net Entropy tool, written by Julien Olivain, is an open-source network scanner that performs real-time entropy scanning of network connections designed to alert against such attacks. According to his findings, the entropy scores will drop to around 6 bits per byte during misuse.

Also useful, the Net Entropy tool can be repurposed to detect high entropy scores across what should be low entropy network channels. This can be useful to detect ad hoc encrypted connections, which may be indicative of suspicious goings-on.

Denouement

I hope you enjoyed this short treatise on entropy and learned why it is an important tool in the information security practitioner’s kit.