SimdHT-Bench paper got accepted in IISWC19 and nominated as a BP Candidate!
Posted by Xiaoyi Lu on August 10, 2019
A paper is accepted in IISWC 2019: SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures. This paper got nominated as a Best Paper Award Candidate.
Congratulations, Dr. Shankar!
Paper Info
[IISWC'19] SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures (Best Paper Award Nomination)
Dipti Shankar, Xiaoyi Lu, and Dhabaleswar K. Panda.
In Proceedings of 2019 IEEE International Symposium on Workload Characterization (IISWC), 2019.
Abstract
With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x–2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.