Article | May 2, 2012

Five Tips For Faster, Simpler, Better Data Deduplication

William Evans Arkeia

By William Evans, CEO, Arkeia

With nearly every network backup and storage management vendor offering a data deduplication product or option, midsized IT environments can be challenged to find appropriately featured dedupe that delivers the time, space, and cost savings they require.

Why dedupe? Dedupe can attain a 95 percent reduction in data volume for users who routinely back up the same data, such as performing nightly backups each business day, or backing up of dozens of VMware virtual machines across multiple physical hosts, said experts at Arkeia Software, a leading provider of fast, easy-to-use and affordable network backup solutions. Although reduction ratios are highly variable, and depend on the type of data and files on the network, significant reductions in time, capacity and costs can be achieved with the right dedupe strategy.

Below are five tips for choosing the right technology to deliver fast, simple, and more affordable dedupe.

1. Start at the source
“Source-side” deduplication lays the foundation for more efficient dedupe. Source-side deduplication determines whether the current backup already has a copy of the specific block of data found on the client. Virtualized environments typically have multiple copies of the same operating systems and applications, and source-side deduplication prevents vast amounts of duplicate, redundant data from moving to disk, tape or cloud storage. Even when backups are confined to local area networks, source-side dedupe enables far faster network backups due to reduced bandwidth requirements—the traditional bottleneck in LAN backup performance.

2. Get in-line
Deduplication can take place in-line, while the backup is in progress, or post-processing in a cache or staging phase before backups are sent to their storage destination. For backup applications shops will prefer in-line deduplication.  Source-side dedupe requires in-line processing because the backup window can’t close until the backup sets are deduplicated and moved off the machine being protected.

3. Go around the block
Dedupe technologies broadly fall into two categories – block-based or file-based. File-based deduplication is rarely sufficient for business networks, which frequently have many versions of the same file with only small differences between them.  Fixed-block deduplication is an improvement, but can’t detect many similar files.  Variable-block deduplication comes at a cost of increased management overhead and/or processing power which slows backups. “Progressive” deduplication provides the best of both world: it’s fast like fixed-block dedupe but offers improved compression like variable-block.  Progressive overcomes tunes block sizes according to the type of data, such as executable files, text files, or database records, to deliver the best overall performance. 

4. Replication acceleration
Replication of on-premise backups to private or public clouds or across a WAN are prime cases where dedupe delivers substantial advantages in speed and efficiency. Deduping backup data before replication not only reduces the time necessary to move data across a network, it reduces the cost of bandwidth and the cost of cloud storage capacity. Former users of tapes for off-site data protection will find that dedupe now makes it feasible to do WAN transfers thanks to the time and cost savings it delivers.

5. Go all in
Today there is generally no reason to select a separate dedupe product on top of a network backup solution.  An all-in-one approach is not only more affordable, it provides savings in ease of integration and configuration.  Deduped storage appliances can shrink storage volumes—but don’t deliver the backup speed improvements offered by source-side deduplication.  Network optimization tools can reduce WAN bandwidth—but not as well as backup and without the benefits of reduced storage requirements at the destination.  Whether it’s a backup server deployed as a physical appliance, a virtual appliance, or as part of a traditional data protection software package, a unified data protection and deduplication system reduces cost and accelerates backups. Deduplication is optimal for accommodating data spread across different servers or locations, virtual machines and hypervisors, for supporting mixed platforms in one network, and for assuring trouble-free restores.