Capacity saving function: data deduplication and compression
The capacity saving function is available for use on internal flash drives only, including data stored on encrypted flash drives. When the capacity saving function is in use, the controller of the storage system performs data deduplication and compression to reduce the size of data to be stored. The compression and deduplication processing is performed asynchronously for pages that store data, and the free area of the pool can be increased, thereby reducing the cost of purchasing drives over time. A DP-VOL with capacity saving enabled is called a data reduction (DRD) volume.
Data received by the storage controller is stored in a temporary area in the pool. When one hour has passed since the data was last updated, it is classified as inactive data, and the capacity saving processing is performed (post process). Data after the capacity saving processing is stored in the data storage area. When the data is updated again, the data stored in the data storage area is no longer required. The used capacity of the pool increases until garbage collection, which collects old data that is no longer required, is performed. The pool capacity that is eventually required is the sum of the physical data capacity after capacity saving and the amount of meta information.
The temporary area and the data storage area are not assigned fixed capacities. They share a pool and use the pool as needed.
- Deduplication The data deduplication function deletes duplicate copies of data written to different addresses in the same pool and maintains only a single copy of the data at one address. The deduplication function is enabled on a
Dynamic Provisioning pool and then on the desired
DP-VOLs in the pool. When deduplication is enabled, data that has multiple copies between
DP-VOLs assigned to that pool is removed.
When you enable deduplication on a pool, the deduplication system data volume (DSD volume) for that pool is created. The deduplication system data volume is used exclusively by the storage system to manage the deduplication function. A search table in the deduplication system data volume is used to locate redundant data in the pool.
- Compression The data compression function utilizes the LZ4 compression algorithm to compress the data. The compression function can be enabled per DP-VOL.
The capacity overheads associated with the capacity saving function include the following:
- Capacity consumed by metadata
The capacity consumed by metadata for the capacity saving function (deduplication and compression) is approximately 3% of the consumed DP-VOL capacity that has been processed by capacity saving. For example, if the consumed capacity of a DP-VOL is 150 TB and the capacity saving feature has processed 100 TB of the 150 TB consumed capacity and reduced it to 30 TB, the capacity consumed by metadata for capacity saving function is approximately 3 TB (3% of 100 TB). The total consumed capacity of this DP-VOL at this instant is 83 TB (30 TB + 50 TB + 3 TB).
- Capacity consumed by garbage (invalid) data
The capacity consumed by garbage data is approximately 7% of the total consumed capacity of all DP-VOLs with capacity saving enabled. The capacity is dynamically consumed based on garbage data created by the capacity saving process and cleaned by the background garbage collection process. The garbage collection process is a background process with a lower priority than host I/O, so the capacity consumed by garbage data depends on both the garbage created and the host I/O rate.
