Latest News

PIM Survey Published

T. Vincon, A. Koch, I. Petrov. Moving Processing to Data:On the Influence of Processing-in-Memory on Data Management. arXiv.

Near-Data Processing, ideally allows executing application-defined data- or compute-intensive operations in-situ, i.e. within (or close to) the physical data storage.

Abstract:

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e. within (or close to) the physical data storage. Thus, Near-Data Processing seeks to minimize expensive data movement, improving performance, scalability, and resource-efficiency. Processing-in-Memory is a sub-class of Near-Data processing that targets data processing directly within memory (DRAM) chips. The effective use of Near-Data Processing mandates new architectures, algorithms, interfaces, and development toolchains.

Paper Accepted at ADBIS

T. Vincon, S. Hardock, C. Riegger, A. Koch, I. Petrov. nativeNDP: Processing Big Data Analytics on Native Storage Nodes. In Proc. ADBIS 2019.

nativeNDP is a framework for Near-Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node.

Abstract:

Data analytics tasks on large datasets are computationally- intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries. In this paper we examine R as analytical tool, managing large persis- tent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near-Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that na- tiveNDP is more than an order of magnitude faster than other pushdown alternatives.

Paper Accepted at IDEAS

C. Riegger, T. Vinccon, I. Petrov. Indexing large updatable Datasets in Multi-Version Database Management Systems. In Proc. IDEAS 2019.

In this paper we present the implementation of Partitined B-Trees in PostgreSQL extended with SIAS.

Abstract:

Database Management Systems (DBMS) need to handle large updatable datasets under OLTP workloads. Most modern DBMS provide snapshots of data in MVCC transaction management scheme. Each transaction operates on a snapshot of the database. It is calculated from a set of tuple versions, containing logical transaction timestamps. This transaction management scheme enables high parallelism and resource-efficient append-only data placement on secondary storage. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes. Partitioned B-Trees (PBT) is based on the structure and algorithms of the ubiquitous B+-Tree. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL's standard B+-Trees PBTs have 50% better transactional throughput under TPC-C.

Paper Accepted at DaMoN

S. Hardock, A. Koch, T. Vinccon, I. Petrov. IPA-IDX: In-Place Appends for B-Tree Indices. In Proc. DaMoN 2019.

IPA-IDX is an approach to handle index modifications modern storage technologies (NVM, Flash) as physical in-place appends, using simplified physiological log records.

Paper Accepted at DaMoN 2019

S. Hardock, A. Koch, T. Vinccon, I. Petrov. IPA-IDX: In-Place Appends for B-Tree Indices. In Proc. DaMoN 2019.

Abstract:

We introduce IPA-IDX - an approach to handle index modifications modern storage technologies (NVM, Flash) as physical in-place appends, using simplified physiological log records. IPA-IDX provides similar performance and longevity advantages for indexes as basic IPA does for tables. The selective application of IPA-IDX and basic IPA to certain regions and objects, lowers the GC overhead by over 60%, while keeping the total space overhead to 2%. The combined effect of IPA and IPA-IDX increases performance by 28%.

Paper Accepted at ICDE

I. Petrov, A. Koch, S. Hardock, T. Vincon, C. Riegger. Native Storage Techniques for Data Management. In Proc. ICDE 2019

Native storage approaches, architectures and techniques for data processing and data management.

21.11.2019 Paper Accepted at ICDE 2019

I. Petrov, A. Koch, S. Hardock, T. Vincon. C. Riegger Native Storage Techniques for Data Management. In Proc. ICDE 2019.

Abstract:

In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies. We argue that neither the design of modern DBMS, nor the architecture of modern storage technologies are aligned with each other. Moreover, the majority of the systems rely on a complex multi-layer and compatibility-oriented storage stack. The result is needlessly suboptimal DBMS performance, inefficient utilization, or significant write amplification due to outdated abstractions and interfaces. In the present tutorial we focus on the concept of native storage, which is storage operated without intermediate abstraction layers over an open native storage interface and is directly controlled by the DBMS. We cover the following aspects of native storage: (i) architectural approaches and techniques; (ii) interfaces; (iii) storage abstractions; (iv) DBMS/system integration; (v) in-storage processing.

DBLab has open-sourced NoFTL, SIAS and cIPT

Check out DBLab's GitHub repository.
We have open-sourced NoFTL, SIAS, and cIPT.
Clone, download, use ... and send us feedback.

New Project Grant

PANDAS: Programmable Appliance for Near Data Processing Accelerated Storage

Funding agency: BMBF

Principle Investigators:
PRO DESIGN Electronic GmbH
Xelera Technologies GmbH
Embedded Systems and Applications Group, Technische Universitaet Darmstadt
Data Management Lab, Reutlingen University

Efficient Data and Indexing Structure for Blockchains in Enterprise Systems

C. Riegger, T. Vincon, I. Petrov.
In Proc. iiWAS 2018

read more ...

17.09.2018 Paper Accepted at iiWAS 2017

C. Riegger, T. Vincon, I. Petrov. Efficient Data and Indexing Structure for Blockchains in Enterprise Systems. In Proc. iiWAS 2018.

[Extended Abstract]

Abstract:

Blockchains yield to new workloads in database management systems and K/V-Stores. Distributed Ledger Technology (DLT) is a technique for managing transactions in ’trustless’ distributed systems. Yet, clients of nodes in blockchain networks are backed by ’trustworthy’ K/V-Stores, like LevelDB or RocksDB in Ethereum, which are based on Log-Structured Merge Trees (LSM-Trees). However, LSM-Trees do not fully match the properties of blockchains and enterprise workloads. In this paper, we claim that Partitioned B-Trees (PBT) fit the proper- ties of this DLT: uniformly distributed hash keys, immutability, consensus, invalid blocks, unspent and off-chain transactions, reorganization and data state / version ordering in a distributed log-structure. PBT can locate records of newly inserted key-value pairs, as well as data of unspent transactions, in separate partitions in main memory. Once several blocks acquire consensus, PBTs evict a whole partition, which becomes immutable, to sec- ondary storage. This behavior minimizes write amplification and enables a beneficial sequential write pattern on modern hardware. Furthermore, DLT implicate some type of log-based versioning. PBTs can serve as MV-Store for data storage of logical blocks and indexing in multi-version concurrency control (MVCC) transaction processing.

Two entries in Encyclopedia of Big Data Technologies, Sakr, Sherif, Zomaya, Albert (Eds.), Springer

I. Petrov, T. Vincon, A. Koch, J. Oppermann, S. Hardock, C. Riegger. Active Storage
In Enc. Big Data Technologies Sakr, Zomaya (Eds.) Springer 2018.

I. Petrov, A. Koch, T. Vincon, S. Hardock, C. Riegger. Transaction Processing on NVM
In Enc. Big Data Technologies Sakr, Zomaya (Eds.) Springer 2018.

NoFTL-KV: Tackling Write-Amplification on KV-Stores with Native Storage Management

T. Vincon, S. Hardock C. Riegger, J. Oppermann, A. Koch, I. Petrov.
In Proc. EDBT 2018

read more ...

22.12.2017 Paper Accepted at EDBT 2018

T. Vincon, S. Hardock C. Riegger, J. Oppermann, A. Koch, I. Petrov. NoFTL-KV: Tackling Write-Amplification on KV-Stores with Native Storage Management. In Proc. EDBT 2018.

[PDF]

Abstract:

Modern persistent Key/Value stores are designed to meet the demand for high transactional throughput and high data-ingestion rates. Still, they rely on backwards-compatible storage stack and abstractions to ease space management, foster seamless proliferation and system integration. Their dependence on the traditional I/O stack has negative impact on performance, causes unacceptably high write-amplification, and limits the storage longevity.
In the present paper we present NoFTL-KV, an approach that results in a lean I/O stack, integrating physical storage management natively in the Key/Value store. NoFTL-KV eliminates backwards compatibility, allowing the Key/Value store to directly consume the characteristics of modern storage technologies. NoFTL-KV is implemented under RocksDB. The performance evaluation under LinkBench shows that NoFTL-KV improves transactional throughput by 33%, while response times improve up to 2.3x. Furthermore, NoFTL-KV reduces write-amplification 19x and improves storage longevity by imately the same factor.