09/26/2010Incoming Encryption ImprovementsThis is a rather technical blog post, and the release notes contain a simpler summary in case you want to skip it. The short story is that S3 Backup 1.0.2 has a much improved encryption. Legacy streams supportS3 Backup supported an option to encrypt the data before uploading since very early betas. A short while after encryption was introduced into the app, the way encryption metadata is stored was changed and is documented on the stream codecs page, the older legacy format is still supported, so you can still download data that were uploaded in the early days, but all uploads are using a newer format. With 1.0.2 release we are are introducing another change that will apply to all new uploads, but the previously stored data will still be accessible transparently. Block cipherUp until now there was a choice between AES and Blowfish ciphers, but starting with the next release only the AES cipher will be available as a choice. When downloading data, Blowfish will still be used when necessary. If you feel strongly about keeping the Blowfish as an option let us know in the comments or by contacting support. Here are some excerpts from the Wikipedia article on AES:
We use AES with 256-bit keys (this is true for all versions of S3 Backup), but the security of encrypted data depends not only on the cipher algorithm but also on its mode of operation and the strength of the key itself. 1.0.2 will bring a significant improvement for both. Up to now we were using the ECB (electronic codebook) mode of operation and trivial padding scheme which is unfortunately not the strongest there is. From now on we are switching to CFB (cipher feedback) which is much stronger and requires no padding. Key strengtheningBefore I explain more changes I need to mention what the threat is. So assuming someone wants your data, what he will need first is to get the encrypted data itself. That would be possible by stealing your AWS credentials, by hacking into Amazons datacenter or by getting your data released via a subpoena. You might be interested in the “Overview of Security Processes” whitepaper by Amazon and their security center. The next step is to get the data decrypted, which is only possible by using the same key that was used for encryption. This key is 256-bit long and there are so many possible keys that it’s simply unfeasible to try them all. Another thing is that the AES decryption process can be applied to any stream of bytes with any key, but the resulting data are garbage for any key except the right one, and that means that when the legitimate user tries to decrypt the data, the key needs to be verified in some way first. To do that, the key signature is calculated and stored along with the encrypted data. When decrypting, only the key with the correct signature will be accepted. The signature is calculated using a cryptographic hash function and is irreversible — knowing the signature does not allow one to just calculate the inverse and get the original key. The attacker could use that signature to verify the attempted keys just like the app would on behalf of the legitimate user. However he needs to do it for every key he tries and there’s still a question of which keys to try — trying them all is impossible. There are a few things to consider here, one is a dictionary attack which uses password-guessing to limit the space of keys to search, another is precomputed hash maps which provide a space-time tradeoff and allow the attacker to crack a lot of keys faster, if they aren’t strong enough. Key derivation functionsBut if the key can be any 256-bit block of data, how can it be strong or weak? That is because the keys are not just random data but have to be derived from a user-provided password (or AWS secret key, if no password was given). So what happens is that we have a KDF (key derivation function) that takes a password and gives us a key useable for the AES. A naive KDF would be vulnereble to precomputed hash maps mentioned above. The way to remedy that is to use a KDF with a random “salt” and store that salt alongside the key signature. This is exactly what S3 Backup does and always did, we used a big salt value, and we used SHA-256 on the salted password as the KDF and SHA-1 for calculating the key signature. The dictionary attack was still a threat though and this is an area we’re improving upon in this release. Dictionary attacks involve trying different passwords, applying the KDF to them, calculating the signature and comparing it to the expected one. This takes a long time either way, but if the KDF is chosen in such a way that it intentionally takes a long time to compute, finding the password becomes completely infeasible, because the search becomes simply too slow. Key derivation improvementsWe’ve evaluated two such functions:
Both have configurable speed (number of iterations) and can be slowed down to any intended degree. We have implemented support for both but decided to go with bcrypt-10. An optimized implementation (such as one we’re using) takes about 1/10th of a second to run just once on a reasonably modern computer, so even a very dedicated and well-resourced attacker would have trouble making progress at that pace. As the computers get faster in the future, we will be able to ramp up the complexity by switching to bcrypt-12, bcrypt-14 etc, each successive increment being four times as demanding. This is a huge improvement over SHA-1 which can be computed hundreds thousands of times per second on the same computer. However, such a slow KDF would slow down the file transfers adding those 100ms to every upload and download, if computed for every file. To avoid this we’ve changed how the salt is generated, and instead of generating a new one for every upload we generate just one per session and use the same encryption key for all uploads in that session. The reverse lookup is optimized as well, so when downloading the files the penalty for finding a password to decrypt them is incurred only once per password-salt combination. A new initialization vector is generated for every upload. We have also changed the key digest hash from SHA-1 to SHA-256. S3 Backup 1.0.2 and later contain all of these updates. The stream format is documented as well. |
|