9 days agoS3 Backup Release Candidate

S3 Backup 1.0 first release candidate is now available.

License management component is still to be added before 1.0 final release, but feature-wise it will be identical to this RC. The purpose of the release candidate is to make sure that the final release doesn’t contain any known bugs, and to do that, it has to be under feature-freeze for some time. That is the reason we are releasing it before the licensing is enabled.

So please try it out and let us know if you encounter any problems. If no major issues are discovered we will have RC2 release later this week (which would include the license check), and the 1.0 final release the next week.

Please also consider preordering the app before the final release. Starting with 1.0 new users will get a full-featured but time-limited license and everyone who preordered will get a full license automatically. We’ll have a separate blog entry detailing how our licensing system works before the release, but if you want to learn about it right now, you can contact customer support or ask in the comments section below.

Changes in this release:

  • Enable F5 (refresh), Backspace (go to parent folder) and Delete keyboard shortcuts
  • Add “validate filedb” menu item (to be explained in a blog post on filedb)
  • Further improve filedb autorepair with an additional but efficient check if the file needs to be uploaded even if filedb says it does. This is useful for various corruption scenarios and for fresh installs backing up into existing remote folders.
  • Integrate instant feedback with new customer support system
  • Update debug log window: limit log size, make maximizable, remove “save log” button (not needed due to persistent logging)
  • Deprecate a big chunk of old GUI library
  • Migrate from Subversion to Mercurial VCS
  • Review and improve folder caching logic
  • New settings handling library (soon to come: password-protected settings compatible with automated backup, this one is requested a lot).
  • New password checking / caching code
  • Improve exe assemblage, leading to about 20% reduction in consumed memory
  • Fix an issue caused by backup rules created in very early betas containing denormalized paths
  • Fix an issue with backup not running on schedule for some customers
  • Fix an issue with filedb maintenance (removing folder but not subfolders)
  • Fix bucket manager bucket size reporting
  • Add more automated testing

Some of the features were implemented but held back for a later release:

  • Concurrent plan execution — much faster backups for jobs containing mostly small files.
  • Concurrent planning and execution — start uploading in parallel with backup planning. This will bring faster backups in general and better handling of huge backup jobs (ones that are estimated to take days or even weeks).

Download S3 Backup 1.0 RC1 and consider preordeing.


23 days agoWe won’t be getting into web services backup, but you can

First of all, let me say that I’m not looking for investors in my company, so what follows is not a solicitation of funding, take it at face value.

I’ve just read the news that Backupify got 900K in funding. Backupify is an online service that backups data from online accounts, mostly for various social services: Facebook, Twitter, Delicious, but also Gmail, Google Docs and a couple more. I’m happy for them, as Rob May is one of the few people around who have a healthy amount of skepticism, understand survivor bias etc. So I really wish his ventures to succeed — I’d like to see thoughtful approach rewarded, as I consider myself approaching things thoughtfully as well.

What this investment tells me is that their field is perceived as ripe for the picking. The problem is that S3 Backup 1.0 is at the door and soon I’ll have my hands full promoting and further developing the app, then there’s a server edition, a Mac port and a lineup of related products waiting for their turn. So I can’t just add online service / social backup to the pile of “things I gotta do”. On the other hand, compared to the complexity of desktop software, all the cross-platform issues, GUI programming, filesystems quirks, don’t get me started... compared to all that online accounts backup is a very straightforward task. I’m not saying it’s easy, but all the complex things about it, I’ve already figured out in my previous and current projects.

Also, it’s no secret that there’s more to business than technology, so even if I can do the tech part with my eyes closed, the business of it, that I don’t have the time or resources to get into. So here’s an idea: this is not in direct competition with that we do around here, so I don’t mind helping someone getting into that field. We would handle the initial development, you would make a business out of it. And it would cost you way less than 900K, too.

Short version: if you want to get into online services backup business fast and with a competitive, scalable solution, drop us a line.

- Sergey


4 weeks agoReview of New S3 Feature: Versioning

Today Amazon announced support for versioning in S3. It’s exactly what one would expect from an AWS service — the feature set is very basic, but good at abstracting some of the lowest-level issues.

Here’s a short review of what it is and what it’s not:

  • The best thing about it is that one can enable versioning on an existing bucket and software unaware of S3 versioning would continue to work producing new versions on every PUT and DELETE operation. But that app itself wouldn’t be able to tell that anything is different.
  • Another is that it introduces MFA deletes. What that means is that the bucket owner can lock it in such a manner that to destructively delete some data you need to have a physical token. That’s certainly handy for deployments where a lot of computers have access to AWS credentials for the bucket — if those leak, MFA delete at least makes sure the data is preserved.
  • Versioned deletes are implemented with delete markers. This means one can simply delete that marker to undo the delete.
  • There are very few new calls, support is mostly tacked onto existing APIs and that’s a good thing.
  • Overall the functionality and features of versioning itself are very similar to what one would accomplish by rolling out one’s own naive versioning scheme. In that regard the ability to list only the latest versions of the objects and omit the marked-as-deleted ones is the one that improves over the naive implementation the most (mostly performance-wise).
  • S3 versioning does nothing in terms of data deduplication.
  • Versions are created on a per-object basis, there’s no concept of snapshot.
  • Versioning can be suspended, so that versioning-unaware applications don’t create new versions on most operations.
  • The PUT operation is version-agnostic, there’s no option to put a new version of the object overwriting the latest one. This means that to keep storage space usage under control one has to make periodic sweeps deleting some of the versions. It’s not very friendly to “at most one new version per day” kind of implementations either.

What this means for our products

We have been working on our new S3 Backup storage logic for a while and even though it includes versioning, we don’t intend to change it to make use of this new functionality. The reason for this is that our take on the matter supports snapshots, has data deduplication, block-level diff and we would win nothing by using Amazon’s own versioning.

Having said that, until our new storage logic is actually deployed in our products, we might enable limited support for using S3 versioning on a bucket and restoring the old versions of the files. There’s no time to get it done before 1.0 release, but if you really want to have this feature, let us know, we’ll use your feedback to schedule the implementation accordingly.


6 weeks agoS3 Backup beta 18: New file-db

S3 Backup beta 18 rev.1518

This version introduces a rewrite of one of the most important parts of the application — backup logic. This was made possible by redesigning a file-db component which I’ll explain in a later post.

The new backup logic works by synchronizing two “virtual” filesystems created from the local one and S3-based one by filtering and reordering them based on backup rules. This affords some of long-requested features, for example, if you change the exclusion masks in some of your backup rules and the new mask excludes some files that were previously backed up, this change will not make the backup job delete the remote copy on next run. This isn’t a trivial matter — the app compares the local and remote storage and given that file mask it would not expect the filtered file on the remote storage, so the new system does well to make the correct decision in this and many other corner cases, like when a different backup jobs or different rules in the same backup job point to the same remote path.

I’ll be later writing on the systems involved (file-db, rule indexing, virtual filesystems, filesystem middleware and more) so that users who are interested in the inner workings of the app have a better grasp of what doing a backup involves.

Another very noticeable change in this version is faster backup job planning phase. We are working on even more speedups, but it’s already much faster than before.

The app now also stores all the debug logs in your user settings folder. We will introduce online reports, notifications and reminders in a later version, but this should work for now.

The file-db format had to be changed, so the first time you run the new version it will automatically be migrated. This happens momentarily and transactionally, so it cannot corrupt it. The log of the conversion is written in the logs folder in case you want to inspect it. You need to know that once it is migrated, the older version will not recognize it and during backups would believe all of the files need to be reuploaded. Basically, once you run beta 18 on a certain computer, do not try to use earlier version on it. If you have an older version installed on a different computer — it would not be influenced in any way, the file-db is local. The migration process is solid, but if you want to keep a copy of the older file-db, it’s stored in %appdata%/s3 backup/filedb.db

The new file-db structure affords much faster operation which is used to ensure its consistency on-the-fly. This matters if you alter the remote storage via different app or from another computer. For example, if you have deleted some of the backed up files from S3 using a different app, the file-db would still think they are there and the next time backup is run, it would not reupload them. But in beta 18, if you browse to the folder where some of the files were removed, the app will immediately notice that they are missing and correct file-db accordingly. These actions are logged and can be seen in the debug log (from the main menu: Tools → Backup Log).

The same happens when listing remote folders during normal backup operation, but not all folders are listed at that stage, so we’ll add a way to manually reindex remote storage, if there’s a need for that and to do it automatically at a schedule, just in case.

The deletion propagation was disabled midway through the life of beta 17 and is now enabled again.

Some more notes on this release:

  • Every time the app runs it will persistently write a log of all its operations to its appdata folder (to open it in explorer type %appdata%/s3 backup in the address bar).
  • The crash logs are also directed to that folder. Normally they would be written to the same folder as the app itself, but program files folder is restricted, so this has better chance to get the log successfully written.
  • When the new bucket is created but the DNS records had no time to update yet, the requests to S3 are met with a redirect to a location that would get the request to the correct datacenter. In this revision, this endpoint is cached, so the redirect is only encountered once. This leads to much faster backups, directory listing and operation in general for new buckets.
  • The GUI code for backup progress reporting was tweaked so it doesn’t try to keep up with every change which allows the backup, especially the planning stage, to work faster.
  • The way directory structure is stored in S3 relies on every folder to contain at least one file or subfolder. This means that empty folders must have a special invisible file in them (S3 object with the key exactly matching the directory path). If that file is not present, the folder would disappear as soon as the last file in it is deleted. To make sure this doesn’t happen, the app will automatically detect if that file is missing and create it if necessary. This is completely transparent to the user and has no performance impact but is reported in the debug logs.
  • The requirements for bucket names have changed since Amazon started offering a choice of datacenter location and some old buckets are only accessible by using the old method, and buckets that are located in locations other than the US-East datacenter must be accessed in a different way. So, to make them both accessible both methods need to be supported and picked automatically depending on the bucket name. S3 Backup was capable of this since beta 16, but beta 18 improves this further by being smarter about this choice.
  • The bucket location menu in bucket manager now has country flags matching the location of the datacenter.
  • UNC support was improved a little, but please note that it will get properly exposed in the UI in a later release.
  • Fixed migration from very old versions (backups.xml in a certain old format was preventing the updated app from starting)

If you come across any issues, please let us know and we’ll do our best to fix it ASAP. We are planning to make beta 19 a release candidate and have 1.0 version out in February, but you can order it while it’s still in beta and get a nice discount.

Download S3 Backup beta 18


9 weeks agoS3 Backup beta 17

S3 Backup beta 17 rev.1422

Some changes in this release:

  • Reduce the passes over data to a minimum. The data needs to be hashed for future change detection, compressed, encrypted and resulting stream hashed again to ensure its integrity in transfer. All of this is done as efficiently as possible with minimal I/O by doing multiple operations during the same pass.
  • Support for international bucket names: accented, Cyrillic and most other characters. This is accomplished by IDNA encoding
  • New encoding / decoding layer
  • Better upload reporting taking into account things like size changes due to compression and retries in estimating the time remaining.
  • Improved connection pooling.
  • Support for transforming proxies (specifically Transfer-Encoding: chunked)
  • Support for non-alphanumeric characters in backup job names.
  • There was an issue for a couple revisions that was causing buckets with capital letters in their names to be inaccessible, this was fixed soon after it was discovered
  • Backup code sometimes was causing remote directories to be repeatedly recreated. This was fixed.
  • New menu code (important step in porting the app to new GUI backend code, but not a visible change).

Beta 17 was done quite some time ago, but we decided against releasing it on holidays to make sure that if any issues crop up, we would be able to put any time necessary to fix it immediately.

So even while we release beta 17 only now, beta 18 well underway already. The changes coming there will be related to the core backup algorithm that should make it yet faster especially for backup sets with huge amount of small files. Currently known issues:

  • If you cancel an operation during the planning phase it will pop up a debug log with a traceback. It’s just a small UI glitch we’ll correct in one of the nearest releases.

If you come across any other issues, please let us know and we’ll do our best to fix it ASAP.

This is release is part of our runup to a 1.0 release which gives you an opportunity to order it at a discount.


See blog archives for more