Latest News

New Paper Accepted

Christian Knoedler, Tobias Vincon, Arthur Bernhardt, Leonardo Solis-Vasquez, Lukas Weber, Ilia Petrov, Andreas Koch. A cost model for NDP-aware query optimization for KV-stores. In Proc. DAMON 2021.

We show the need for optimisations based on a cost model due to the usage of the traditional stack as well as the upcoming execution on computational storage.

Abstract:

Many modern DBMS architectures require transferring data from storage to process it afterwards. Given the continuously increasing amounts of data, data transfers quickly become a scalability limiting factor. Near-Data Processing and smart/computational storage emerge as promising trends allowing for decoupled in-situ operation execution, data transfer reduction and better bandwidth utilization. However, not every operation is suitable for an in-situ execution and a careful placement and optimization is needed. In this paper we present an NDP-aware cost model. It has been implemented in MySQL and evaluated with nKV. We make several observations underscoring the need for optimization.

Paper Accepted at RAW@IPDPS

Lukas Weber, Lukas Sommer, Leonardo Solis-Vasquez, Tobias Vincon, Christian Knoedler, Arthur Bernhardt, Ilia Petrov, Andreas Koch. A Framework for the Automatic Generation of FPGA-based Near-Data Processing Accelerators in Smart Storage Systems. In Proc. Reconfigurable Architectures Workshop. RAW@IPDPS.

We introduce a framework for automatic generation of data format parsers and accessors for NDP DBMS on computational storage

Abstract:

Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash. It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications. The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.

Paper Accepted at DAPD

L Weber, T. Vincon, C Knoedler, L. Solis-Vasquez, A. Bernhardt, I. Petrov, A. Koch. On the Necessity of Explicit Cross-Layer Data Formats in Near-Data Processing Systems. In Journal of Distributed and Parallel Databases. DAPD.

The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements.

Abstract:

Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.

New Preprint Available

Christian Riegger, Arthur Bernhardt, Bernhard Moessner, Ilia Petrov.
bloomRF: On Performing Range-Queries with Bloom-Filters based on Piecewise-Monotone Hash Functions and Dyadic Trace-Trees. [arXiv].

bloomRF is a unified method for approximate membership testing that can efficiently perform both point- and range-queries on a single data structure.

Abstract:

We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries on a single data structure. bloomRF extends Bloom-Filters with range query support and may replace them. The core idea is to employ a dyadic interval scheme to determine the set of dyadic intervals covering a data point, which are then encoded and inserted. bloomRF introduces Dyadic Trace-Trees as novel data structure that represents those covering intervals implicitly. A Trace-Tree encoding scheme represents the set of covering intervals efficiently, in a compact bit representation. Furthermore, bloomRF introduces novel piecewise-monotone hash functions that are locally order-preserving and thus support range querying. We present an efficient membership computation method for range-queries. Although, bloomRF is designed for integers it also supports string and floating-point data types. It can also handle multiple attributes and serve as multi-attribute filter. We evaluate bloomRF in RocksDB and in a standalone library. bloomRF is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings.

New DBlab Member

The DBlab team is happy welcome Christian Knoedler on board!

Arthur will strengthen our neoDBMS-Team.

New Paper Accepted

T. Vincon, L. Weber, A. Bernhardt, C. Riegger, S. Hardock, C. Knoedler, F. Stock, L. Solis-Vasquez, S. Tamimi, A. Koch, I. Petrov. nKV in Action: Accelerating KV-Stores on Native Computational Storage with Near-Data Processing. In Proc. VLDB 2020.

In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing.

Abstract:

Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system de- sign hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, has yet to see widespread use. In this paper we demonstrate various NDP alternatives in nKV, which is a key/value store utilizing native computational storage and near-data processing. We showcase the execution of classical operations (GET, SCAN) and complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4x-2.7x better performance due to NDP. nKV runs on real hardware - the COSMOS+ platform.

New Paper Accepted

T. Bang, I. Oukid, N. May, I. Petrov, C. Binnig. Robust Performance of Main Memory Data Structures by Configuration. In Proc. SIGMOD 2020.

In this paper, we present a new approach for achieving robust performance of data structures making it easier to reuse the same design for different hardware generations but also for different workloads.

Abstract:

In this paper, we present a new approach for achieving robust performance of data structures making it easier to reuse the same design for different hardware generations but also for different workloads. To achieve robust performance, the main idea is to strictly separate the data structure design from the actual strategies to execute access operations and adjust the actual execution strategies by means of so-called configurations instead of hard-wiring the execution strategy into the data structure. In our evaluation we demonstrate the benefits of this configuration approach for individual data structures as well as complex OLTP workloads.

New Paper Accepted

T. Bang, N. May, I. Petrov, C. Binnig. The Tale of 1000 Cores: An Evaluation of Concurrency Control on Real(ly) Large Multi-Socket Hardware In Proc. DAMON 2020.

We follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores.

Abstract:

In this paper, we set out the goal to revisit the results of “Starring into the Abyss [...] of Concurrency Control with [1000] Cores” and analyse in-memory DBMSs on today’s large hardware. Despite the original assumption of the authors, today we do not see single- socket PUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.

New Paper Accepted

T. Vincon, L. Weber, A. Bernhardt, A. Koch, I. Petrov. nKV: Near-Data Processing with KV-Stores on Native Computational Storage. In Proc. DAMON 2020.

In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing.

Abstract:

Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use. In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data process- ing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on execut- ing traditional KV operations (GET, SCAN ) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4x-2.7x better performance on real hardware – the COSMOS+ platform.

New Paper Accepted

T. Vincon, A. Bernhardt, L. Weber, A. Koch, I. Petrov. On the Necessity of Explicit Cross-Layer DataFormats in Near-Data Processing Systems. In Proc. HardBD 2020.

The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements.

Abstract:

Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become viable. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.

New Project Grant

pimDB: infrastructure for Processing-In-Memory in modern DBMS

Principle Investigators:
Data Management Lab
Funding agency: MWK, Baden-Württemberg, Germany

pimDB provides infrastructures for PIM research in modern main-memory DBMS.

New Paper Accepted

C. Riegger, T. Vincon, R. Gottstein, I. Petrov. MV-PBT: Multi-Version Indexing for Large Datasets and HTAP Workloads. In Proc. EDBT 2020.

MV-PBT is a version-aware index structure for HTAP workloads, supporting index-only visibility-checks and flash-friendly I/O patterns.

Abstract:

Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple-versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table – not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets. In this paper we propose the Multi-Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index- only visibility checks and flash-friendly I/O patterns. The ex- perimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads. MV-PBT offers 40% higher tx. throughput compared to WiredTiger’s LSM-Tree implementation under YCSB.