Data deduplication is not the only way to improve the efficiency of primary storage. Another aspect to data deduplication is when compression is accomplished. Compression can at times produce greater storage savings than deduplication, when used in data sets with variable amounts of empty space (such as databases) and in large repositories of unstructured data with few duplicates but many file types that can be compressed.
Compression occurs with either inline or postprocessing as the most common options. ZFS features inline deduplication, where duplicate chunks are identified and removed before they are written to the back-end disk drives of the array. The benefit of this is that data is immediately compressed, unlike the postprocess technique that requires additional disk space to hold all the data and have it deduplicated by a scheduled process later.
Advantages Inline Compression
- Improved IO efficiency – reduce write time
- Reduce overall storage footprint
- Increased internal bandwidth of the array
- 1.2 – 3.8 compression ratios
- 12.5% compression threshold to prevent data inflation
In a ZFS storage solution shares can optionally compress data before writing to the storage pool, allowing for greater storage utilization at the expense of increased CPU utilization. Four levels of compression are offered, allowing users to choose from the fastest compression (which works in simple
inputs but with lesser CPU resources) to the best compression (highest compression but consumes a more CPU resources). If compression doesn’t yield a minimum amount of space savings, then the process is aborted in order to avoid timely decompression when the data is being read back. If implemented with deduplication, the data is first compressed and then deduplicated.
Typical Data Compression Ranges
Office File share docs 1.2 – 1.6
Messaging databases 1.2 – 1.8
Structured databases 1.5 – 2.1
Virtual machine images 1.5 – 3.0
HTML files 1.6 – 2.0
Uncompressed Images 1.9 – 3.5
Executables 2.0 – 2.4
Uncompressed Videos 2.2 – 3.8
As with deduplication, data compression can improve the performance of data protection by replicating, backing up, and restoring the data in its compressed form. Given all of this, there is very little reason to not be using compression for all your data; there is minimal downside, and a significant upside with most applications.