Understanding the Idiosyncrasies of Emerging BlueField DPUs

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

Arjun Kashyap, Yuke Li, Darren Ng, Xiaoyi Lu

Abstract

Data Processing Units (DPUs) are becoming available in datacenter environments to offload/accelerate workloads from the host. However, a comprehensive analysis is required to help users determine how to effectively utilize DPUs for their workloads, considering the various configurations and generations available. To fill in this gap, we conduct a fair and rigorous characterization by performing 15 benchmarking tests to demonstrate the evolution of representative SoC-based DPUs, specifically NVIDIA’s BlueField-1, BlueField-2, and BlueField-3. Our work surfaces several idiosyncrasies across three key characterization dimensions—network, DMA engine, and memory. For network, we exhaustively test two major DPU modes—on-path (and five submodes) and off-path modes. We develop DPUDMABench, a microbenchmark suite to systematically analyze different data exchange primitives supported by DPU’s DMA engine. We also conduct two application case studies examining the DPU mode’s performance impact on TCP/IP and RDMA-based key-value stores (MICA and HERD). Based on our multi-generational DPU characterization, we identify and summarize 14 major idiosyncrasies, along with providing guidelines for optimal system and future hardware design.

Full text links

External link

Conference Proceedings

Isbn
9798400715372
Publisher
Association for Computing Machinery
Address
New York, NY, USA
Doi
10.1145/3721145.3725780
Booktitle
Proceedings of the 39th ACM International Conference on Supercomputing
Pages
807–821
Numpages
15
Location
Salt Lake City, U.S.A
Series
ICS '25

Cite

Plain text

BibTeX