Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

Abstract

📜 Abstract

Data explosion in scientific simulations creates a grand challenge for storage and I/O performance in high-performance computing (HPC). Lossy prediction-based compression can significantly reduce the data size while meeting the error requirement as set by user. Existing prediction-based lossy compressors, however, suffer from low prediction accuracy and high quantization error. This paper presents novel prediction methods by incorporating a combination of Lorenzo predictor and multidimensional linear-regression predictor to improve compression ratio with controlled error. Our proposed method can significantly better preserve data high resolution and precision. We evaluate our method under nine scientific data sets, and demonstrate the overall improvement in prediction accuracy and compression ratio. The proposed algorithm is designed to support parallel execution and is ready to be leveraged in modern parallel computing systems such as Intel MIC and GPUs. Our evaluation indicates that the compression ratio achieved by our algorithm doubles the state-of-the-art prediction-based compression algorithms. The proposed solution is integrated in SZ, a state-of-the-art scientific data compression framework, and is released as open-source software.

Description

✨ Summary

The paper “Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization” introduces advanced techniques for data compression in scientific computing. By leveraging a mix of Lorenzo predictors and multidimensional linear-regression predictors, the authors significantly enhance prediction accuracy and compression ratios while managing quantization errors. Originally published in 2014, this paper influenced subsequent research and technologies around high-performance computing (HPC) data reduction methods.

Upon examination, later works investigating similar data compression topics within scientific data contexts cite this research. Furthermore, other research exploring scientific computation efficiency acknowledges the predictive improvements this research introduces. Additionally, implementations such as the open-source “SZ” framework continue to integrate developments from this paper. This paper has helped refine algorithms in fields demanding efficient storage and effective computation with large datasets.