My co-advised Ph.D. student Dipti Shankar has successfully defended her thesis and graduated. Congratulations, Dr. Shankar!

Thesis Info

Title: Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters

Year and Degree: 2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Committee

  • Dhabaleswar K. Panda (Advisor)
  • Xiaoyi Lu (Co-Advisor)
  • Feng Qin (Committee Member)
  • Gagan Agrawal (Committee Member)

Abstract

With the recent emergence of in-memory computing for Big Data analytics, memory-centric and distributed key-value storage has become vital to accelerating data processing workloads, in high-performance computing (HPC) and data center environments. This has led to several research works focusing on advanced key-value store designs with Remote- Direct-Memory-Access (RDMA) and hybrid `DRAM+NVM’ storage designs. However, these existing designs are constrained by the blocking store/retrieve semantics; incurring additional complexity with the introduction of high data availability and durability requirements. To cater to the performance, scalability, durability and resilience needs of the diverse key-value store-based workloads (e.g., online transaction processing, offline data analytics, etc.), it is therefore vital to fully exploit resources on modern HPC systems. Moreover, to maximize server scalability and end-to-end performance, it is necessary to focus on designing an RDMA-aware communication engine that goes beyond optimizing the key-value store middleware for better client-side latencies.

Towards addressing this, in this dissertation, we present a `holistic approach’ to designing high-performance, resilient and heterogeneity-aware key-value storage for HPC clusters, that encompasses: (1) RDMA-enabled networking, (2) high-speed NVMs, (3) emerging byte-addressable persistent memory devices, and, (4) SIMD-enabled multi-core CPU compute capabilities. We first introduce non-blocking API extensions to the RDMA- Memcached client, that allows an application to separate the request issue and completion phases. This facilitates overlapping opportunities by truly leveraging the one-sided characteristics of the underlying RDMA communication engine, while conforming to the basic Set/Get semantics. Secondly, we analyze the overhead of employing memory-efficient resilience via Erasure Coding (EC), in an online fashion. Based on this, we extend our proposed RDMA-aware key-value store, that supports non-blocking API semantics, to enable overlapping the EC encoding/decoding compute phases with the scatter/gather communication protocol involved in resiliently storing the distributed key-value data objects.

This work also examines durable key-value store designs for emerging persistent memory technologies. While RDMA-based protocols employed in existing volatile DRAM-based key-value stores can be directly leveraged, we find that there is a need for a more integrated approach to fully exploit the fine-grained durability of these new byte-addressable storage devices. We propose 'RDMP-KV’, that employs a hybrid 'server-reply/server- bypass’ approach to 'durably’ store individual key-value pair objects on the remote persistent memory-equipped servers via RDMA. RDMP-KV’s runtime can easily adapt to existing (server-assisted durability) and emerging (appliance durability) RDMA-capable interconnects, while ensuring server scalability and remote data consistency. Finally, the thesis explores SIMD-accelerated CPU-centric hash table designs, that can enable higher server throughput. We propose an end-to-end SIMD-aware key-value store design, 'SCOR- KV’, which introduces optimistic 'RDMA+SIMD’-aware client-centric request/response offloading protocols. SCOR-KV can minimize the server-side data processing overheads to achieve better scalability, without compromising on the client-side latencies.

With this as the basis, we demonstrate the potential performance gains of the proposed designs with online (e.g, YCSB) and offline (e.g, in-memory and distributed burst-buffer over Lustre for Hadoop I/O) workloads on small-scale and production-scale HPC clusters.

Keywords

High-Performance Computing; Key-Value Store; RDMA; Persistent Memory;

source