UMR-EC paper got accepted in HPDC19!

A paper is accepted in HPDC 2019: UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems. This year, HPDC only accepted 22 papers out of 106. 11 papers have gone through shepherding. This paper got accepted directly. The first author of this paper is one of my Ph.D. students, Haiyang Shi. Congratulations to Haiyang and other co-authors!

Paper Info

[HPDC'19] UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems

Haiyang Shi, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda.

In Proceedings of the 28th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2019. (Acceptance Rate: 20.7%, 22/106)

Abstract

Distributed storage systems typically need data to be stored redundantly to guarantee data durability and reliability. While the conventional approach towards this objective is to store multiple replicas, today’s unprecedented data growth rates encourage modern distributed storage systems to employ Erasure Coding (EC) techniques, which can achieve better storage efficiency. Various hardware-based EC schemes have been proposed in the community to leverage the advanced compute capabilities on modern data center and cloud environments. Currently, there is no unified and easy way for distributed storage systems to fully exploit multiple devices such as CPUs, GPUs, and network devices (i.e., multi-rail support) to perform EC operations in parallel; thus, leading to the under-utilization of the available compute power. In this paper, we first introduce an analytical model to analyze the design scope of efficient EC schemes in distributed storage systems. Guided by the performance model, we propose UMR-EC, a Unified and Multi-Rail Erasure Coding library that can fully exploit heterogeneous EC coders. Our proposed interface is complemented by asynchronous semantics with optimized metadata-free scheme and EC rate-aware task scheduling that can enable a highly-efficient I/O pipeline. To show the benefits and effectiveness of UMR-EC, we re-design HDFS 3.x write/read pipelines based on the guidelines observed in the proposed performance model. Our performance evaluations show that our proposed designs can outperform the write performance of replication schemes and the default HDFS EC coder by 3.7x - 6.1x and 2.4x - 3.3x, respectively, and can improve the performance of read with failure recoveries up to 5.1x compared with the default HDFS EC coder. Compared with the fastest available CPU coder (i.e., ISA-L), our proposed designs have an improvement of up to 66.0% and 19.4% for write and read with failure recoveries, respectively.

C-GDR paper got accepted in IPDPS19!

A paper is accepted in IPDPS 2019: C-GDR: High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks. Congratulations to all the authors.

Paper Info

[IPDPS'19] C-GDR: High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks

Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, and Dhabaleswar K. Panda.

In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019.

Abstract

In recent years, GPU-based platforms have received significant success for parallel applications. In addition to highly optimized computation kernels on GPUs, the cost of data movement on GPU clusters plays critical roles in delivering high performance for end applications. Many recent studies have been proposed to optimize the performance of GPU- or CUDA-aware communication runtimes and these designs have been widely adopted in the emerging GPU-based applications. These studies mainly focus on improving the communication performance on native environments, i.e., physical machines, however GPU-based communication schemes on cloud environments are not well studied yet. This paper first investigates the performance characteristics of state-of-the-art GPU-based communication schemes on both native and container-based environments, which show a significant demand to design high-performance container-aware communication schemes in GPU-enabled runtimes to deliver near-native performance for end applications on clouds. Next, we propose the C-GDR approach to design high-performance Container-aware GPUDirect communication schemes on RDMA networks. C-GDR allows communication runtimes to successfully detect process locality, GPU residency, NUMA, architecture information, and communication pattern to enable intelligent and dynamic selection of the best communication and data movement schemes on GPU-enabled clouds. We have integrated C-GDR with the MVAPICH2 library. Our evaluations show that MVAPICH2 with C-GDR has clear performance benefits on container-based cloud environments, compared to default MVAPICH2-GDR and Open MPI. For instance, our proposed C- GDR can outperform default MVAPICH2-GDR schemes by up to 66% on micro-benchmarks and up to 26% on HPC applications over a container-based environment.

Xiaoyi will serve as TPCs/Chairs for multiple 2019 conferences!

Xiaoyi will serve as a General Co-Chair for Bench 2019. Please submit your papers to Bench 2019. Call for Papers

Xiaoyi will serve as a TPC Co-Chair for The 5th IEEE International Workshop on High-Performance Big Data and Cloud Computing (HPBDC 2019). Please submit your papers to HPBDC 2019. Call for Papers

Xiaoyi will serve as TPCs for the following conferences in 2019!

We got an NSF grant! Congrats!

A collaborative grant for research on large-scale hybrid memory systems is funded by NSF. I'm the PI from the OSU side.

Congratulations to our great team!

Thanks a lot for NSF's support!!!

Grant Information: SPX: Collaborative Research: Memory Fabric: Data Management for Large-scale Hybrid Memory Systems

PADSYS Lab Starts!

I start my new faculty position from this month! I'm looking forward to working with self-motivated students who are interested in doing system research.

More importantly, we are happy to announce that PADSYS Lab Starts the great journey from The Ohio State University (OSU)!

(Courtesy: Celebration by Nick Youngson CC BY-SA 3.0 Alpha Stock Images)