Faster Lossy Compression with SZ
📜 Abstract
Error-bounded lossy compression is becoming critical to the success of today’s scientific projects due to ever-increasing volume of data generated by high-fidelity numerical simulations and instrumental observations. In this paper, we propose a novel error-bounded lossy compressor based on the SZ lossy compression framework, which includes two new techniques – prediction-based data decoupling and improved Huffman encoding – to allow high compression ratios, high decompression throughput, and low data distortion. In the evaluation, we compare the new design with state-of-the-art error-bounded lossy compressors. The compression ratios and speeds obtained by our fast compression algorithm are consistently much higher than those of other lossy compressors under the same data distortion requirement, demonstrated by our extensive experiments with 13 different real-world scientific datasets. Our SZ compressor proves to be efficient, versatile, and capable of significantly improving simulation efficiency and storage performance.
✨ Summary
The paper “Faster Lossy Compression with SZ” introduces a new method for lossy data compression tailored to high-performance computing environments. The authors are Dingwen Tao, Sian Jin, Chao Chen, Sheng Di, and Franck Cappello. This paper was published in November 2015 and offers improvements to the SZ lossy compression framework by introducing two techniques: prediction-based data decoupling and improved Huffman encoding. These enhancements yield higher compression ratios, better decompression throughput, and minimal data distortion, as evidenced by experiments on 13 scientific datasets.
Through web searches, I found that this paper has influenced subsequent research in fields dealing with large-scale scientific computations, providing a foundation for further advancements in error-bounded lossy compression algorithms. For example, the paper has been cited in research that deals with large-scale data analytics in high-performance computing environments and compression for scientific simulations (IEEE Xplore). The techniques proposed have also been evaluated and referred to in studies focusing on optimal data reduction strategies for exascale computing systems. These citations illustrate the impact of this work on subsequent developments in data compression for scientific computing. No additional specific direct industry applications were found in the selected searches.