Blog

Rubrik: Deduplication or Tetris?

The year was 1990. As I sit cross-legged on the floor of my room between a makeshift blanket fort, I remember being totally captivated as I held an 8-bit game console (Nintendo Game Boy) in my hand mesmerized by the sounds of the Russian folk song Korobeiniki in the game of Tetris. The intensity and speed increased as I delved deeper into the bits and bytes of the rectangular small green console. Fast-forward to 2017. I am still mesmerized by discussions of deduplication which seems to consume and puzzle the highest echelons of the technical elite in storage and data protection.

Like an unorganized 24,000 piece jigsaw puzzle with enough  unique terminology (vendor secret-sauce) such as inline, post-process, fixed-size blocks, variable length, source-based, target-based, ratios, algorithms and compression — deduplication converations can be arduous.

Rubrik Cloud Data Management (CDM) has the intelligence to understand your data as it is being ingested in parallel and streamed into our scale-out platform via RESTful API calls. Whether archiving data to the cloud or replicating data between two clusters, Rubrik avoids sending duplicate data over the network. The API-first architecture provides you with the ability to automate your tetris-like backup and recovery jobs, cloud workflows, and daily management tasks. Rubrik’s founding engineers designed the CDM fabric with the future in mind by imagining a simpler world of data protection where each IT stakeholder can achieve clairvoyance into the overall strategy of data management. The innovation of Rubrik goes well beyond the horizon of just being another deduplication storage system.

The Many Shapes of Deduplication

Imagine each potentially deduplicated dataset as numerous, different Tetris-like shapes that will be churned to fit into the backup and storage system based on type, change rate, retention period, and SLA policy. Your overall deduplication ratio or data reduction percentage will depend on a combination of these factors. Rubrik provides cost savings out of the gate beyond deduplication and compression by reducing your precious daily management through automation and simplicity. Rubrik leverages LZ4 for compression which provides a solid balance of speed and maximum compression.

Do you need to be concerned with data loss? Not at all, as a Rubrik 4-node cluster uses Reed-Solomon erasure coding (4,2) for fault tolerance, resiliency, and added storage efficiencies.

This allows Rubrik to be resilient to a single node failure or dual drive failures maintaining strict consistency across three nodes. Within a 3 + chassis configuration, Rubrik can tolerate the failure of an entire chassis which is the equivalent of losing 4 nodes. The cluster will still operate without any down time maintaining your SLA’s.

Based on the data types chart below, certain email applications, structured databases, encrypted, image, audio/video data achieves lower deduplication values. Please keep this in mind when analyzing your dedup ratios and comparing different vendors. Vendors may calculate dedup ratios or data reduction values using different formulas or methods.

The Rubrik Way – Data Reduction

Rubrik has a unique way of performing deduplication within our cloud-scale file system (Atlas) and data management layer (Cerebro). Rubrik uses data reduction which is the contraction of the space required to store data by using data deduplication and data compression technologies. To explain it another way, data reduction is the amount of data that you are not storing in our system. If your data reduction is 75%, then you are only storing 25%. A true incremental forever approach is also utilized eliminating the need for recurrent full backups further reducing your Total Cost of Ownership (TCO).

In version 4.1, Rubrik provides the ability to logically divide Rubrik clusters into multiple management units with independent management of logical objects such as SLA Domains, Archival Targets, Protected Objects, Users, and more. Extending granularity to manage these objects and control what data you are deduping against. Rubrik typically provides data reduction rates between 75 – 83% (4:1 to 6:1 ratio). Occasionally customers achieve data reduction or dedup ratios as high as 20:1 or 95% which are primarily due to the workloads, data types, and retention that they are protecting.

Rubrik provides a powerful and customizable visual analytics and reporting tool, called Envision, that offers details on system capacity of the local Rubrik cluster including dedup ratio, logical dedup ratio, data reduction, and logical data reduction.

Why do we report our deduplication numbers this way?

Rubrik displays deduplication numbers that are not inflated by providing several attributes into the capacity. These are explained below:

Ingested Dedup Ratio, or simply Dedupe Ratio is the ratio between the amount of data ingested through the network by the amount of data physically stored.

Logical Dedup Ratio is the amount of logical data (say a full per snapshot) represented in the system by the amount of data physically stored.

Ingested Data Reduction or simply Data Reduction is the percentage reduction in total data size of the backup. It is based on the amount of data ingested vs stored.

Logical Data Reduction is the percentage reduction in the size of the backup calculated on the basis of full backups instead of incremental differences. It is based on the total source data protected by each backup.

Deduplication Savings or Fast Recovery?

The most important feature in a backup and storage appliance is the recovery performance. Rubrik was architected for fast recovery performance especially when reading data that is deduplicated. Performance is maximized and preserved for our Instant Recovery and Live Mount operations within the Rubrik CDM platform along with network traffic being minimized. You can even power on virtual machines or SQL backups and provision a clone to your chosen point-in-time. Rubrik allows you to instantly mount copies of your SQL databases directly running on Rubrik to any SQL instance or host that you choose.

Backup administrators in charge of managing secondary storage or copy data management systems tend to grapple with several data protection and disaster recovery questions. Where is my critical enterprise data and information being stored across all of our data centers? Is it on-premises, in the cloud, on tape in an offsite vault, or all of the above? How are we going to manage the massive amounts of data that is being created on a daily basis and meet our Recovery Time Objectives (RTOs)? Your backup and storage appliance may be able to ingest a ton of data, but does it intelligently identify and manage each object data type that is being ingested?

It’s time to reconsider your overall data management strategy and not just purely backup and recovery. Gaining consensus on business requirements and how they link to your data retention strategies may not be as simple as hashing it all out in a single meeting. In a scale-out architecture world where sensors can be embedded in almost anything and data captured and analyzed to make more informed decisions, deduplication is once again table stakes since the trove of information will only continue to climb.

To learn more about Rubrik’s data reduction, please download the Rubrik Cloud Data Management: Tech Overview & How It Works White Paper.