Two papers got accepted in SC 2020!

Two papers are accepted in SC 2020: "INEC: Fast and Coherent In-Network Erasure Coding" and "RDMP-KV: Designing Remote Direct Memory Persistence-based Key-Value Stores with PMEM".

Congratulations to Haiyang, Tianxi, Shashank, and Dr. Shankar!

Paper Info

[SC'20] INEC: Fast and Coherent In-Network Erasure Coding

Haiyang Shi and Xiaoyi Lu.

In Proceedings of the 33rd International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020. (Acceptance Rate: 22.3%)

Abstract

Erasure coding (EC) is a promising fault tolerance scheme that has been applied to many well-known distributed storage systems. The capability of Coherent EC Calculation and Networking on modern SmartNICs has demonstrated that EC will be an essential feature of in-network computing. In this paper, we propose a set of coherent in-network EC primitives, named INEC. Our analyses based on the proposed $\alpha$-$\beta$ performance model demonstrate that INEC primitives can enable different kinds of EC schemes to fully leverage the EC offload capability on modern SmartNICs. We implement INEC on commodity RDMA NICs and integrate it into five state-of-the-art EC schemes. Our experiments show that INEC primitives significantly reduce 50th , 95th , and 99th percentile latencies, and accelerate the end-to-end throughput, write, and degraded read performance of the key-value store co-designed with INEC by up to 99.57%, 47.30%, and 49.55%, respectively.

[SC'20] RDMP-KV: Designing Remote Direct Memory Persistence-based Key-Value Stores with PMEM

Tianxi Li*, Dipti Shankar*, Shashank Gugnani, and Xiaoyi Lu.

In Proceedings of the 33rd International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020. (Acceptance Rate: 22.3%, *Co-First Authors)

Abstract

Byte-addressable persistent memory (PMEM) can be directly manipulated by Remote Direct Memory Access (RDMA) capable networks. However, existing studies to combine RDMA and PMEM can not deliver the desired performance due to their PMEM-oblivious communication protocols. In this paper, we propose novel PMEM-aware RDMA-based communication protocols for persistent key-value stores, referred to as Remote Direct Memory Persistence based Key-Value stores (RDMP- KV). RDMP-KV employs a hybrid ‘server-reply/server-bypass’ approach to ‘durably’ store individual key-value objects on PMEM-equipped servers. RDMP-KV’s runtime can easily adapt to existing (server-assisted durability) and emerging (appliance durability) RDMA-capable interconnects, while ensuring server scalability through a lightweight consistency scheme. Performance evaluations show that RDMP-KV can improve the server-side performance with different persistent key-value storage architectures by up to 22x, as compared with PMEM-oblivious RDMA-‘Server-Reply’ protocols. Our evaluations also show that RDMP-KV outperforms a distributed PMEM-based filesystem by up to 65% and a recent RDMA-to-PMEM framework by up to 71%.

Xiaoyi will serve as TPCs/Chairs for multiple 2020 conferences!

Xiaoyi will serve as a TPC Vice-Chair for IEEE Cloud Summit 2020. Please submit your papers to IEEE Cloud Summit 2020. Call for Papers

Xiaoyi will serve as a TPC Co-Chair for The 6th IEEE International Workshop on High-Performance Big Data and Cloud Computing (HPBDC 2020). Please submit your papers to HPBDC 2020. Call for Papers

Xiaoyi will serve as TPCs for the following conferences in 2020!

SimdHT-Bench paper got accepted in IISWC19 and nominated as a BP Candidate!

A paper is accepted in IISWC 2019: SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures. This paper got nominated as a Best Paper Award Candidate.

Congratulations, Dr. Shankar!

Paper Info

[IISWC'19] SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures (Best Paper Award Nomination)

Dipti Shankar, Xiaoyi Lu, and Dhabaleswar K. Panda.

In Proceedings of 2019 IEEE International Symposium on Workload Characterization (IISWC), 2019.

Abstract

With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x–2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.

Dipti got the Ph.D. degree! Congrats!

My co-advised Ph.D. student Dipti Shankar has successfully defended her thesis and graduated. Congratulations, Dr. Shankar!

Thesis Info

Title: Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters

Year and Degree: 2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Committee

• Feng Qin (Committee Member)
• Gagan Agrawal (Committee Member)

Abstract

With the recent emergence of in-memory computing for Big Data analytics, memory-centric and distributed key-value storage has become vital to accelerating data processing workloads, in high-performance computing (HPC) and data center environments. This has led to several research works focusing on advanced key-value store designs with Remote- Direct-Memory-Access (RDMA) and hybrid DRAM+NVM’ storage designs. However, these existing designs are constrained by the blocking store/retrieve semantics; incurring additional complexity with the introduction of high data availability and durability requirements. To cater to the performance, scalability, durability and resilience needs of the diverse key-value store-based workloads (e.g., online transaction processing, offline data analytics, etc.), it is therefore vital to fully exploit resources on modern HPC systems. Moreover, to maximize server scalability and end-to-end performance, it is necessary to focus on designing an RDMA-aware communication engine that goes beyond optimizing the key-value store middleware for better client-side latencies.

Towards addressing this, in this dissertation, we present a holistic approach’ to designing high-performance, resilient and heterogeneity-aware key-value storage for HPC clusters, that encompasses: (1) RDMA-enabled networking, (2) high-speed NVMs, (3) emerging byte-addressable persistent memory devices, and, (4) SIMD-enabled multi-core CPU compute capabilities. We first introduce non-blocking API extensions to the RDMA- Memcached client, that allows an application to separate the request issue and completion phases. This facilitates overlapping opportunities by truly leveraging the one-sided characteristics of the underlying RDMA communication engine, while conforming to the basic Set/Get semantics. Secondly, we analyze the overhead of employing memory-efficient resilience via Erasure Coding (EC), in an online fashion. Based on this, we extend our proposed RDMA-aware key-value store, that supports non-blocking API semantics, to enable overlapping the EC encoding/decoding compute phases with the scatter/gather communication protocol involved in resiliently storing the distributed key-value data objects.

This work also examines durable key-value store designs for emerging persistent memory technologies. While RDMA-based protocols employed in existing volatile DRAM-based key-value stores can be directly leveraged, we find that there is a need for a more integrated approach to fully exploit the fine-grained durability of these new byte-addressable storage devices. We propose 'RDMP-KV’, that employs a hybrid 'server-reply/server- bypass’ approach to 'durably’ store individual key-value pair objects on the remote persistent memory-equipped servers via RDMA. RDMP-KV’s runtime can easily adapt to existing (server-assisted durability) and emerging (appliance durability) RDMA-capable interconnects, while ensuring server scalability and remote data consistency. Finally, the thesis explores SIMD-accelerated CPU-centric hash table designs, that can enable higher server throughput. We propose an end-to-end SIMD-aware key-value store design, 'SCOR- KV’, which introduces optimistic 'RDMA+SIMD’-aware client-centric request/response offloading protocols. SCOR-KV can minimize the server-side data processing overheads to achieve better scalability, without compromising on the client-side latencies.

With this as the basis, we demonstrate the potential performance gains of the proposed designs with online (e.g, YCSB) and offline (e.g, in-memory and distributed burst-buffer over Lustre for Hadoop I/O) workloads on small-scale and production-scale HPC clusters.

Keywords

High-Performance Computing; Key-Value Store; RDMA; Persistent Memory;

source

TriEC paper got accepted in SC19 and nominated as a BSP Finalist!

A paper is accepted in SC 2019: TriEC: Tripartite Graph Based Erasure Coding NIC Offload. This year, SC only had 72 papers being accepted directly and 15 papers asked for major revisions out of 344 initial submissions. This paper got accepted directly and nominated as a Best Student Paper (BSP) Finalist.

Congratulations to my student, Haiyang!

Paper Info

[SC'19] TriEC: Tripartite Graph Based Erasure Coding NIC Offload (Best Student Paper Finalist)

Haiyang Shi and Xiaoyi Lu.

In Proceedings of the 32nd International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2019. (Acceptance Rate: 22.7%, 78/344)

Abstract

Erasure Coding (EC) NIC offload is a promising technology for designing next-generation distributed storage systems. However, this paper has identified three major limitations of current-generation EC NIC offload schemes on modern SmartNICs. Thus, this paper proposes a new EC NIC offload paradigm based on the tripartite graph model, namely TriEC. TriEC supports both encode-and-send and receive-and-decode operations efficiently. Through theorem-based proofs, co-designs with memcached (i.e., TriEC-Cache), and extensive experiments, we show that TriEC is correct and can deliver better performance than the state-of-the-art EC NIC offload schemes (i.e., BiEC). Benchmark evaluations demonstrate that TriEC outperforms BiEC by up to 1.82x and 2.33x for encoding and recovering, respectively. With extended YCSB workloads, TriEC reduces the average write latency by up to 23.2% and the recovery time by up to 37.8%. TriEC outperforms BiEC by 1.32x for a full-node recovery with 8 million records.