In my last post I talked about how anybody with enough money (a small 6-figure sum) could create a rogue certification authority (CA). This would allow them to generate certificates for any web site that seem to be genuine. That would allow you to set up a fake citibank.com and at the SSL/TLS level, everything would look correct.
For a limited time only, the bad guys have a deep discount, thanks to Alex Sotirov and friends. They leverage the fact that MD5 is known to be very weak.
MD5 is a cryptographic hash algorithm, which means it takes any document and creates a fixed size string as a result (called a fingerprint or checksum), where the string should be as good as random to the outside world. If someone can take an MD5 and create an arbitrary document that gives the same output, then a lot of things in the world would be broken, because MD5 is used to authenticate document legitimacy. For instance, lots of AV vendors use MD5 fingerprints to indicate what's good and what's bad. If people could create bad stuff that has the same MD5 fingerprint as Microsoft Word, the whitelists of many AV vendors would be useless, and it would be difficult to detect what's bad via MD5 alone (which is very common).
That kind of attack doesn't seem like it's going to be practical in the next few years, though we should expect it will be possible sooner than later (developers: stop using MD5. Switch to something in the SHA family).
But, there's another problem with MD5. One person creating a document can create two documents that give the same MD5 output (a collision). That is subtly different from coming up with a document that has the same MD5 as a fixed document. That's because the person trying to find a collision can modify both documents, usually modifying things that aren't too important, like textual comments). So, in the above example, a bad guy could produce two program with the same MD5, one bad, one good. But neither would be Microsoft Word.
MD5 was designed with the hopes that this kind of collision would not be possible. But now that it is, what can a bad guy do with it? Well, Sotirov's attack provides us a great example. Here's approximately what happens:
The bad guy generates two certificates that have the same MD5 output (note that the bad guy needs to guess a few values that the CA puts in there). One is a regular web site certificate, and one is a certificate that allows them to act as a certification authority (so they can spoof Citibank). Let's call these the "good cert" and the "bad cert". Neither of these certificates will be trusted by browsers without a signature from a valid certification authority. The bad guy sends the good cert to a real CA, and gets the signature back. He then takes the signature, and uses it on the "bad cert". Since the CA's signature is really a computation of the MD5, the real CA has been tricked into endorsing the bad certificate.
Now, with the bad certificate signed, the bad guy can generate new certificates for citibank, and so on. They could generate certificates for any web site you go to, so they can eavesdrop or modify communications in transit.
That sounds bad, doesn't it? The initial reaction I'm seeing from well educated people in the security community is the sense that this breaks the internet, because the cost of breaking PKI becomes so low. I don't see it that way, as long as CAs respond properly and quickly.
One quick way to deal with the problem would be for all CAs to stop signing certificate signing requests with MD5. Only accept ones that choose to use SHA1 or other things that are at least as strong as SHA1 (SHA1 has some of the same fundamental weaknesses as MD5 but it's still strong enough that it is not likely to fall to this attack any time in the next few years). Most CAs aren't using MD5 anymore, with RapidSSL being the big sole exception.
That's about it. If there's a real demand for new MD5-based certificates (and I can't think of a reason why there would be), the legitimate CA can generate the certificate. They would basically take the data and generate the certificate, add some random data into a comment field, and then send back a signed certificate. In the attack we're discussing, the bad guy creates the certificate, not the CA. And, because of the random data, the bad guy cannot predict the certificate that the legitimate CA is going to sign. Simply randomizing the identifiers that the CA puts in the certificate would also address this problem (if the random identifier is big enough).
Now, let me address the comments I'm hearing. Some people believe that all web site (or code signing) certificates that have been signed by MD5 are potentially vulnerable. That is not true. You need to come up with the collision before the certificate gets signed. If you try to do it afterword, it is computationally MUCH harder. It's like trying to create a bad program that has the same MD5 as Microsoft Word.
Therefore, the only certificates that are problems are ones where someone has already launched this attack. We know it's happened once, but that list is probably pretty small. And, there's a good shot that the bad guys haven't done it at all.
If legitimate CAs keep signing certificates with MD5, and they keep allowing the customer to generate the certificates, then there will be a big problem. But right now, the number of attacks is probably so low as to be in the low single digits. There are ways we can deal with this.
First, as we learn about these certificates, the CAs that were tricked into signing them should revoke them. Then, when browsers check for revocation (via OCSP, the online certificate status protocol or CRLs, certificate revocation lists), they will see that a link in the chain is bad.
However, some people have revocation checking turned off. Here, the browser vendors should help, too. When these certificates surface, the browser vendors should blacklist them in the browser. There's still a window of vulnerability there, but it won't be too bad.
Plus, if some small CAs don't improve their operational practices right away (I'm sure the big guys will, RapidSSL is the big question mark), this is also something that can be dealt with at the browser level. They can keep a list of CAs that are known to have good practices or bad practices. They can also say, "only accept certificates signed by first tier or second tier CAs, unless a third tier CA is on the white list".
The trick will then be finding evidence of these rogue CA certificates. The more they're used, the more likely they are to be found. So, either they will all get used heavily and the window of vulnerability will close very quickly (with urgent requests for users to update their browsers) or there will be a few of them that are barely used. Either way, the average person isn't going to be at much risk for long.
That's not to take away from the achievement of Sotirov and co. Their work is very impressive. They used about 60K worth of hardware, and it's conceivable that those resources could be rented far more cheaply.
The thing that doesn't impress me is how poorly the world responds to cryptographic problems. MD5 has appeared weak for almost 5 years, yet everybody is still using it. This exact attack was discussed in the crypto community as being "within reach", but nothing was done about it. There are no big pushes to abandon MD5 wholesale. Especially when it comes to cryptography, we should all be as conservative as possible. Switching to the SHA family for new applications costs nothing. For old applications, it often doesn't cost much. Come on world, get off your butts!
But, as for the Internet being broken, I don't see it. This is a huge attack, but it can be dealt with fairly easily. As Douglas Adams said, "Don't Panic! :P"
If you want to complain or discuss, feel free to message me on twitter, @viega.