PADSYS Lab | Publications

Accelerating Big Data Processing on Modern HPC Clusters

Conquering Big Data with High Performance Computing, 2016

Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat Islam, Dipti Shankar, Dhabaleswar K. (DK) Panda

Abstract

Modern HPC systems and the associated middleware (such as MPI and parallel file systems) have been exploiting the advances in HPC technologies (multi-/many-core architecture, RDMA-enabled networking, and SSD) for many years. However, Big Data processing and management middleware have not fully taken advantage of such technologies. These disparities are taking HPC and Big Data processing into divergent trajectories. This chapter provides an overview of popular Big Data processing middleware, high-performance interconnects and storage architectures, and discusses the challenges in accelerating Big Data processing middleware by leveraging emerging technologies on modern HPC clusters. This chapter presents case studies of advanced designs based on RDMA and heterogeneous storage architecture, that were proposed to address these challenges for multiple components of Hadoop (HDFS and MapReduce) and Spark. The advanced designs presented in the case studies are publicly available as a part of the High-Performance Big Data (HiBD) project. An overview of the HiBD project is also provided in this chapter. All these works aim to bring HPC and Big Data processing into a convergent trajectory.

Full text links

External link

Book Chapter

Booktitle: Conquering Big Data with High Performance Computing
Publisher: Springer International Publishing
Address: Cham
Pages: 81--107
Isbn: 978-3-319-33742-5
Doi: 10.1007/978-3-319-33742-5_5
Series: Chapter '16
Editors: Ritu Arora

Cite

Plain text

BibTeX