Compression Analysis for BlueField-2/3 Data Processing Units: Lossy and Lossless Perspectives

IEEE Micro, 2024

Yuke Li, Arjun Kashyap, Yanfei Guo, Xiaoyi Lu

Abstract

A data processing unit (DPU) with programmable smart network interface card containing system-on-chip (SoC) cores is now a valuable addition to the host CPU, finding use in high-performance computing (HPC) and data center clusters for its advanced features, notably, a hardware-based data compression engine (C-engine). With the convergence of big data, HPC, and machine learning, data volumes burden communication and storage, making efficient compression vital. This positions DPUs as tools to accelerate compression workloads and enhance data-intensive applications. This article characterizes lossy (e.g., SZ3) and lossless (e.g., DEFLATE, lz4, and zlib) compression algorithms using seven real-world datasets on Nvidia BlueField-2/-3 DPUs. We explore the potential opportunities for offloading these compression workloads from the host. Our findings demonstrate that the C-engine within the DPU can achieve up to 26.8x speedup compared to its SoC core. We also provide insights on harnessing BlueField for compression, presenting seven crucial takeaways to steer future compression research with DPUs.

Journal Article

Journal
IEEE Micro
Volume
44
Number
02
Issn
1937-4143
Pages
8-19
Doi
10.1109/MM.2023.3343636
Publisher
IEEE Computer Society
Address
Los Alamitos, CA, USA
Series
IEEE Micro'24
Month
mar

Cite

Plain text

BibTeX