Two papers are accepted in SC 2020: "INEC: Fast and
Coherent In-Network Erasure Coding" and "RDMP-KV: Designing Remote Direct
Memory Persistence-based Key-Value Stores with PMEM".
Congratulations to
Haiyang, Tianxi, Shashank, and Dr. Shankar!
Paper Info
[SC'20] INEC: Fast and Coherent In-Network Erasure Coding
Haiyang Shi and Xiaoyi Lu.
In Proceedings of the 33rd International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020. (Acceptance Rate: 22.3%)
Abstract
Erasure coding (EC) is a promising fault tolerance scheme that has been applied
to many well-known distributed storage systems. The capability of Coherent EC
Calculation and Networking on modern SmartNICs has demonstrated that EC will be
an essential feature of in-network computing. In this paper, we propose a set
of coherent in-network EC primitives, named INEC. Our analyses based on the
proposed $\alpha$-$\beta$ performance model demonstrate that INEC primitives can enable
different kinds of EC schemes to fully leverage the EC offload capability on
modern SmartNICs. We implement INEC on commodity RDMA NICs and integrate it
into five state-of-the-art EC schemes. Our experiments show that INEC
primitives significantly reduce 50th , 95th , and 99th percentile latencies,
and accelerate the end-to-end throughput, write, and degraded read performance
of the key-value store co-designed with INEC by up to 99.57%, 47.30%, and
49.55%, respectively.
[SC'20] RDMP-KV: Designing Remote Direct Memory Persistence-based Key-Value Stores with PMEM
Tianxi Li*, Dipti Shankar*, Shashank Gugnani, and Xiaoyi Lu.
In Proceedings of the 33rd International Conference for High Performance
Computing, Networking, Storage and Analysis (SC), 2020. (Acceptance Rate:
22.3%, *Co-First Authors)
Abstract
Byte-addressable persistent memory (PMEM) can be directly manipulated by Remote
Direct Memory Access (RDMA) capable networks. However, existing studies to
combine RDMA and PMEM can not deliver the desired performance due to their
PMEM-oblivious communication protocols. In this paper, we propose novel
PMEM-aware RDMA-based communication protocols for persistent key-value stores,
referred to as Remote Direct Memory Persistence based Key-Value stores (RDMP-
KV). RDMP-KV employs a hybrid ‘server-reply/server-bypass’ approach to
‘durably’ store individual key-value objects on PMEM-equipped servers.
RDMP-KV’s runtime can easily adapt to existing (server-assisted durability) and
emerging (appliance durability) RDMA-capable interconnects, while ensuring
server scalability through a lightweight consistency scheme. Performance
evaluations show that RDMP-KV can improve the server-side performance with
different persistent key-value storage architectures by up to 22x, as compared
with PMEM-oblivious RDMA-‘Server-Reply’ protocols. Our evaluations also show
that RDMP-KV outperforms a distributed PMEM-based filesystem by up to 65% and a
recent RDMA-to-PMEM framework by up to 71%.
Xiaoyi will serve as a TPC Vice-Chair for IEEE Cloud Summit 2020.
Please submit your papers to IEEE Cloud Summit 2020. Call for Papers
Xiaoyi will serve as a TPC Co-Chair for The 6th IEEE International Workshop on High-Performance Big Data and Cloud Computing (HPBDC 2020).
Please submit your papers to HPBDC 2020. Call for Papers
Xiaoyi will serve as TPCs for the following conferences in 2020!
A paper is accepted in IISWC 2019: SimdHT-Bench: Characterizing SIMD-Aware Hash
Table Designs on Emerging CPU Architectures. This paper got nominated as a
Best Paper Award Candidate.
Congratulations, Dr. Shankar!
Paper Info
[IISWC'19] SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures (Best Paper Award Nomination)
Dipti Shankar, Xiaoyi Lu, and Dhabaleswar K. Panda.
In Proceedings of 2019 IEEE International Symposium on Workload Characterization (IISWC), 2019.
Abstract
With the emergence of modern multi-core CPU architectures that support data
parallelism via vectorization, several storage systems have been employing
SIMD-based techniques to optimize data-parallel operations on in-memory
structures like hash-tables. In this paper, we perform an in-depth
characterization of the opportunities for incorporating AVX vectorization-based
SIMD-aware designs for hash table lookups on emerging CPU architectures. We
analyze the challenges and design dimensions involved in exploiting
vectorization-based parallel key searching over cache-optimized non-SIMD hash
tables. Based on this, we design a comprehensive micro-benchmark suite,
SimdHT-Bench, that enables evaluating the performance and applicability of CPU
SIMD-aware hash table designs for accelerating different read-intensive
workloads. With SimdHT-Bench, we study five different use-case scenarios with
varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake
multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench,
we employ these performance studies to design a high-performance SIMD-aware
RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’
workload. We demonstrate that the SIMD-integrated designs can achieve up to
1.45x–2.04x improvement in server-side Get throughput and up to 34% improvement
in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized
non-SIMD MemC3 hash table design, on a high-performance compute cluster with
Intel Skylake processors and InfiniBand EDR interconnects.
My co-advised Ph.D. student Dipti Shankar has successfully defended her thesis and graduated. Congratulations, Dr. Shankar!
Thesis Info
Title: Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters
Year and Degree: 2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Committee
- Dhabaleswar K. Panda (Advisor)
- Xiaoyi Lu (Co-Advisor)
- Feng Qin (Committee Member)
- Gagan Agrawal (Committee Member)
Abstract
With the recent emergence of in-memory computing for Big Data analytics,
memory-centric and distributed key-value storage has become vital to
accelerating data processing workloads, in high-performance computing (HPC) and
data center environments. This has led to several research works focusing on
advanced key-value store designs with Remote- Direct-Memory-Access (RDMA) and
hybrid `DRAM+NVM’ storage designs. However, these existing designs are
constrained by the blocking store/retrieve semantics; incurring additional
complexity with the introduction of high data availability and durability
requirements. To cater to the performance, scalability, durability and
resilience needs of the diverse key-value store-based workloads (e.g., online
transaction processing, offline data analytics, etc.), it is therefore vital to
fully exploit resources on modern HPC systems. Moreover, to maximize server
scalability and end-to-end performance, it is necessary to focus on designing
an RDMA-aware communication engine that goes beyond optimizing the key-value
store middleware for better client-side latencies.
Towards addressing this, in this dissertation, we present a `holistic approach’
to designing high-performance, resilient and heterogeneity-aware key-value
storage for HPC clusters, that encompasses: (1) RDMA-enabled networking, (2)
high-speed NVMs, (3) emerging byte-addressable persistent memory devices, and,
(4) SIMD-enabled multi-core CPU compute capabilities. We first introduce
non-blocking API extensions to the RDMA- Memcached client, that allows an
application to separate the request issue and completion phases. This
facilitates overlapping opportunities by truly leveraging the one-sided
characteristics of the underlying RDMA communication engine, while conforming
to the basic Set/Get semantics. Secondly, we analyze the overhead of employing
memory-efficient resilience via Erasure Coding (EC), in an online fashion.
Based on this, we extend our proposed RDMA-aware key-value store, that supports
non-blocking API semantics, to enable overlapping the EC encoding/decoding
compute phases with the scatter/gather communication protocol involved in
resiliently storing the distributed key-value data objects.
This work also examines durable key-value store designs for emerging persistent
memory technologies. While RDMA-based protocols employed in existing volatile
DRAM-based key-value stores can be directly leveraged, we find that there is a
need for a more integrated approach to fully exploit the fine-grained
durability of these new byte-addressable storage devices. We propose 'RDMP-KV’,
that employs a hybrid 'server-reply/server- bypass’ approach to 'durably’ store
individual key-value pair objects on the remote persistent memory-equipped
servers via RDMA. RDMP-KV’s runtime can easily adapt to existing
(server-assisted durability) and emerging (appliance durability) RDMA-capable
interconnects, while ensuring server scalability and remote data consistency.
Finally, the thesis explores SIMD-accelerated CPU-centric hash table designs,
that can enable higher server throughput. We propose an end-to-end SIMD-aware
key-value store design, 'SCOR- KV’, which introduces optimistic
'RDMA+SIMD’-aware client-centric request/response offloading protocols. SCOR-KV
can minimize the server-side data processing overheads to achieve better
scalability, without compromising on the client-side latencies.
With this as the basis, we demonstrate the potential performance gains of the
proposed designs with online (e.g, YCSB) and offline (e.g, in-memory and
distributed burst-buffer over Lustre for Hadoop I/O) workloads on small-scale
and production-scale HPC clusters.
Keywords
High-Performance Computing; Key-Value Store; RDMA; Persistent Memory;
source
A paper is accepted in SC 2019: TriEC: Tripartite Graph Based Erasure Coding
NIC Offload. This year, SC only had 72 papers being accepted directly and 15
papers asked for major revisions out of 344 initial submissions. This paper got
accepted directly and nominated as a Best Student Paper (BSP) Finalist.
Congratulations to my student, Haiyang!
Paper Info
[SC'19] TriEC: Tripartite Graph Based Erasure Coding NIC Offload (Best Student Paper Finalist)
Haiyang Shi and Xiaoyi Lu.
In Proceedings of the 32nd International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2019. (Acceptance Rate: 22.7%, 78/344)
Abstract
Erasure Coding (EC) NIC offload is a promising technology for designing
next-generation distributed storage systems. However, this paper has identified
three major limitations of current-generation EC NIC offload schemes on modern
SmartNICs. Thus, this paper proposes a new EC NIC offload paradigm based on the
tripartite graph model, namely TriEC. TriEC supports both encode-and-send and
receive-and-decode operations efficiently. Through theorem-based proofs,
co-designs with memcached (i.e., TriEC-Cache), and extensive experiments, we
show that TriEC is correct and can deliver better performance than the
state-of-the-art EC NIC offload schemes (i.e., BiEC). Benchmark evaluations
demonstrate that TriEC outperforms BiEC by up to 1.82x and 2.33x for encoding
and recovering, respectively. With extended YCSB workloads, TriEC reduces the
average write latency by up to 23.2% and the recovery time by up to 37.8%.
TriEC outperforms BiEC by 1.32x for a full-node recovery with 8 million
records.