Hao Qi, Weicong Chen, Chenghong Wang, Xiaoyi Lu
Secure, efficient, and scalable AllReduce-based data aggregation is essential for Artificial Intelligence (AI) and scientific applications on modern High-Performance Computing (HPC) and cloud infrastructures. As AllReduce is increasingly used across these distributed infrastructures, privacy has become a critical concern. State-of-the-art (SOTA) Homomorphic Encryption (HE)-based AllReduce solutions introduce high overhead, require secure key exchanges, and remain vulnerable to collusion. We propose DPAR, the first differentially private, collusion-resistant AllReduce framework optimized for large-scale HPC and AI workloads. DPAR introduces three key innovations: integrating Differential Privacy (DP) to eliminate collusion risks without key exchanges, scalable noise growth to preserve accuracy, and performance optimizations using a noise pooling mechanism. DPAR is a drop-in Message Passing Interface (MPI) AllReduce replacement, providing strong privacy with minimal performance cost. Evaluated on Delta and Frontier supercomputers with up to 8192 cores, DPAR outperforms the SOTA HE solution by up to 34.7% in modern AI workloads.