Review of New S3 Feature: Versioning

Today Amazon announced support for versioning in S3. It’s exactly what one would expect from an AWS service — the feature set is very basic, but good at abstracting some of the lowest-level issues.

Here’s a short review of what it is and what it’s not:

The best thing about it is that one can enable versioning on an existing bucket and software unaware of S3 versioning would continue to work producing new versions on every PUT and DELETE operation. But that app itself wouldn’t be able to tell that anything is different.
Another is that it introduces MFA deletes. What that means is that the bucket owner can lock it in such a manner that to destructively delete some data you need to have a physical token. That’s certainly handy for deployments where a lot of computers have access to AWS credentials for the bucket — if those leak, MFA delete at least makes sure the data is preserved.
Versioned deletes are implemented with delete markers. This means one can simply delete that marker to undo the delete.
There are very few new calls, support is mostly tacked onto existing APIs and that’s a good thing.
Overall the functionality and features of versioning itself are very similar to what one would accomplish by rolling out one’s own naive versioning scheme. In that regard the ability to list only the latest versions of the objects and omit the marked-as-deleted ones is the one that improves over the naive implementation the most (mostly performance-wise).
S3 versioning does nothing in terms of data deduplication.
Versions are created on a per-object basis, there’s no concept of snapshot.
Versioning can be suspended, so that versioning-unaware applications don’t create new versions on most operations.
The PUT operation is version-agnostic, there’s no option to put a new version of the object overwriting the latest one. This means that to keep storage space usage under control one has to make periodic sweeps deleting some of the versions. It’s not very friendly to “at most one new version per day” kind of implementations either.

What this means for our products

We have been working on our new S3 Backup storage logic for a while and even though it includes versioning, we don’t intend to change it to make use of this new functionality. The reason for this is that our take on the matter supports snapshots, has data deduplication, block-level diff and we would win nothing by using Amazon’s own versioning.

Having said that, until our new storage logic is actually deployed in our products, we might enable limited support for using S3 versioning on a bucket and restoring the old versions of the files. There’s no time to get it done before 1.0 release, but if you really want to have this feature, let us know, we’ll use your feedback to schedule the implementation accordingly.