How much entropy in that password?

On his Security Now podcast, Steve Gibson recently disagreed (transcript; search for “xkcd”) with the way Randall Munroe calculated the number of bits of entropy of passwords in this excellent XKCD comic:

XKCD: Password Strength

The disagreement boils down to Randall giving a lower bound on the entropy, and Steve giving an upper bound (and not realizing—or neglecting to mention—the difference). Randall is right, and Steve is wrong. Let me explain:

Lower bound

In both passwords featuring in the comic, Randall assumes a particular password template—in effect, an automatic procedure where you input some number of random bits and get a password. An example, similar to (but simpler than) the first of the comic's password templates: Pick a six-letter dictionary word (in lowercase), capitalize one of the letters, and stick a number at the end. The number of possible passwords generated by this template is the number of words in the dictionary times six times ten; the number of bits of entropy is the base-2 logarithm of this number.

This is a lower bound of the entropy in the passwords generated by this template, because we assume that the adversary knows everything—in particular, knows which template the password was made from—except the random numbers we used to pick the particular dictionary word, the letter to capitalize, and the last character. Any particular adversary may know less, but no attacker will know more. The lower bound on the entropy describes how many guesses even a maximally well-informed adversary must make.

Of course, there are caveats: In particular, where the template asks for randomness (such as in “pick a dictionary word”), you are expected to use high-quality randomness such as produced by coin flips, dice, or good computer programs. Just trying to think of a random word won’t cut it; you’ll be more likely to think of some words than others, and thus get less entropy than you thought.

Upper bound

An upper bound on the entropy of a given password can be obtained by finding a simple template compatible with the password (i.e., such that the password could have been generated from that template) and then counting how many bits of randomness the template uses to produce the password. For example, given the password “zoMbie8”, one candidate template is “pick seven characters that are either lower- or uppercase letters or numbers”; this gives an upper bound of (26+26+10)^7 = 3,521,614,606,208 possible passwords, or about 42 bits.

Another candidate template is “pick a six-letter dictionary word (in lowercase), capitalize any one letter, and add a digit at the end”. The dictionary on my computer has 9300 six-letter words, times six possible positions for the capitalized letter, times ten possible digits at the end, gives 558,000 possible passwords, or about 19 bits.

Now, we can all agree that 19 bits is quite a bit less than 42—meaning that the former is a tighter bound. But how low can we go? Given a specific enough template, the entropy becomes arbitrarily small; consider e.g. the template “take the word ‘zombie8’ and choose at random whether to capitalize the third letter”, which has just two possible passwords, for a single bit of entropy. How can we meaningfully speak of an “upper bound”, if we can make it as small as we want? (Specifically, consider the case where the password was in fact generated by the second template, and the lower bound is 19 bits. If our upper bound can be lower than the lower bound, we have a problem.)

The catch is the provision that the template be “simple”, as I sneakily specified in the first sentence describing the upper bound. The precise meaning of “simple” is “describable with few bits”, because strictly speaking, the upper bound is the sum of the number of bits needed to describe the template and the number of random bits used by the template. (This what the parenthesis in the first panel of the comic is talking about.) As long as the latter is much larger than the former we don’t make much of an error by omitting it, but as we make the template more and more specific, the error grows.

A useful rule of thumb for evaluating the simpleness of a template is to consider how many other templates just like it there are. The first template, with seven random characters, has very few variants (you could choose to include other characters than letters and digits or change the length, but that’s about it). The second template has more, but still not many (you could e.g. change the number of capitalized characters, and change the position of the digit). The third template obviously has a great many very similar variants—just choose another word or another digit!

So, which one of them is the password strength?

Which number do you go by when choosing a password? A lower bound, which describes how strong your password is guaranteed to be, or an upper bound, which describes how strong it isn’t?

Which number is more useful printed on elevators? The number of people they’re guaranteed to hold, or the number of people guaranteed to be enough to make the cable snap?

Password strength meters

So, the lower bound on the entropy is the useful number when creating a pasword. But which number do you get when testing your password in a password strength meter (such as this one)? An upper bound! There’s a simple reason for this: when calculating the lower bound, you need to know what template was used; just seeing one particular password produced by the template isn’t enough.

The password strength meter tries a number of templates. If one of them is close enough to what was actually used, its entropy estimate can be quite good; but if not, it can be way, way off (and it’s always an overestimate, never an underestimate).

Password strength meters are useful for identifying weak passwords, but they can’t guarantee that a given password isn’t weak. That can only be done by generating the password from a template that uses enough random bits.

And that’s why Randall is right and Steve is wrong—Randall gets his numbers from the amount of random bits poured into the password template to generate the password (lower bounds), while Steve gets his numbers by feeding Randall’s passwords to a password strength meter (upper bounds). It’s not that Steve isn’t doing his math right, it’s that he isn’t doing the right math.

How hard is it to crack?

The whole point of a password is that it shouldn’t be guessable, so in a very real sense, the strength of a password is defined by how hard it is to crack.

Password cracking programs work essentially the same way as password strength meters, but backwards: using a succession of templates, they try every combination of random bits that can be fed to that template and see if the generated password is the right one. They start by using slightly complex templates (such as ones based on a random dictionary word), and if those fail, fall back to simpler templates (based on trying every possible password of some bounded length). If the template actually used to generate the password is close to one of the templates the cracker tries, the number of tries needed won’t be all that much more than what’s guaranteed by the lower bound on the password’s entropy. But if the actual template isn’t close to any of the ones the cracker tries, it needs many more tries.

Consider for example the password “zoMbie8”; if the cracker tries templates based on a single dictionary word with a few simple tweaks, it should be able to find it in only millions of tries, but if it has to resort to trying raw combinations of letters and digits, the tries will number in the millions of millions.

This means that there are two ways to make a secure password: use a template the password crackers don’t know about (or don’t bother to try, because so few people use it for their passwords), or use any old template and feed it with enough random bits. The former strategy relies on outwitting smart people who spend much of their time coming up with better ways to crack passwords; the latter just takes more coin flips. It’s security by obscurity vs. real security.

(Excercise for the reader: What’s the problem with Steve’s Password Haystack scheme?)

Postscript: The actual entropy

I’ve talked a lot about lower and upper bounds on a password’s entropy, so you might be wondering the obvious: What’s the actual entropy, and why aren’t I talking about it?

The entropy of a string of characters (which is just what a password is) is closely related to its Kolmogorov complexity—in a sentence, the entropy of your password is equal to the length of the shortest computer program that generates it. However, this definition is useless to us, since it can be shown that Kolmogorov complexity isn’t computable.

A password template, being essentially just a string, has its entropy computed the same way, which is why I was just waving my hands when discussing it earlier. Approximating it by zero as we do gives lower bounds that are lower than strictly necessary, and upper bounds that are lower than they should be and thus run the risk of not being correct (but nothing much in the analysis depends on them being correct).