Monday, September 10, 2007

Deduplication

Many storage vendors have jumped on the deduplication bandwagon, promising users incredible improvements in storage space, data recovery and so on. In many environments, dedupe can both shrink backup times and extend the amount of data retained by only copying newly-created blocks in a backup session. By extending the deduplication within and across backup sessions (i.e. only saving one copy of word.exe across servers and one copy of jre.dll within each backup session), backups require much less time and resources than ever before.

The promise of deduplication and the the reality, however, often differ greatly. Dedupe can be performed on the host or at the storage target; in-line during the backup or post-backup. Different combinations are necessary to meet the requirements of some applications. You may find that dedupe performed post-backup may work well for small, flat file backups, especially across multiple server backups. Large databases, might need in-line dedupe (which has a tremendous impact on the backup process) to achieve any level of cost savings or to even return a viable backup file. Vendors offer a Chinese menu of products and features, with an equally confusing array of licenses and software add-ons to make it all work.

So what do we, the customers, actually need from all of these? Basically, we're all after shorter backups with greater granularity in smaller amounts of storage. Add in faster recovery times and user self-service restores and you've got the recipe for an impressive backup solution. Now, all we need is any vendor to offer this as a product - not the morass of loosely-coupled software products offered today.

No comments: