A tutorial is accepted in IISWC 2020: Benchmarking and Accelerating Big Data Systems with RDMA, PMEM, and NVMe-SSD.
Congratulations to Haiyang and Shashank!
[IISWC'20] Benchmarking and Accelerating Big Data Systems with RDMA, PMEM, and NVMe-SSD
Xiaoyi Lu, Haiyang Shi, and Shashank Gugnani.
2020 IEEE International Symposium on Workload Characterization (IISWC), 2020.
The convergence of HPC, Big Data, and Deep Learning is becoming the next game-changing opportunity. Modern HPC systems and Cloud Computing platforms have been fueled with the advances in multi-/many-core architectures, Remote Direct Memory Access (RDMA) enabled high-speed networks, persistent memory (PMEM), and NVMe-SSDs. However, many Big Data systems and libraries (such as Hadoop, Spark, Flink, Memcached) have not embraced such technologies fully. Recent studies have shown that default designs of these components can not efficiently leverage the advanced features on modern clusters with RDMA, PMEM, and NVMe-SSD. In this tutorial, we will provide an in-depth overview of the architectures, programming models, features, and performance characteristics of RDMA networks, PMEM, and NVMe-SSD. We will examine the challenges in re-/co-designing communication and I/O components of Big Data systems and libraries with these emerging technologies. We will provide benchmark-level studies and system-level (like Hadoop/Spark/TensorFlow/Memcached) case studies to discuss how to efficiently use these new technologies for real applications.