At first ...

former_member640648 · ‎12-09-2019

At first ...

(EN)

Thank you for reading this blog post. The Japanese version can be found below.

Recently, I wandered about strage type of tables.

I would write here the topic with basic knowledge of HANA database.

Let's relearn with me !

(JP)
この記事をご覧いただきありがとうございます。

今回、テーブルのストレージタイプについて疑問を持ったため、

HANAにおけるデータベースの構造と合わせて調べてみました。

既にご存じの方も多いかもしれませんが、復習の意味も込めてお付き合いくだされば幸いです。

※日本語記事は後半に記載しております。

Points

HANA is column database

Almost all the tables is column table

The specific of Row table is following,
- There is no item to be culculated.
- ID is unique for the information.
- Not be modified information
- There is less rows than column table
- There is less columns than column table

Effort

We will be able to imagine specifically and use Properly both of column store and row store.

Question;I thought

HANA DB has the structure of “in-memory database” and “column database”.

Two types of table storage types can be selected: row store and column store.

If you frequently calculate using stored information,use the column store.
If this is not such case,use the low store basically.
However, I was wondering how it was used concretely.

At first,I will compare HANA database with other database for prerequisite knowledge.Next,I explain the differences between Row store and Column store.I will also mention about concretely use cases.

Database

Memory close to the CPU: High-speed data communication possible / small capacity / volatile
Memory away from the CPU: Access speed is relatively slow / large capacity / Nonvolatile
Speed is different depending on where you place the data (there is a data hierarchy)

Database stracture

On-disc database and In-memory database

On-disk database

On-disk database:It is the database which store data on disk storage, such as HDD.

Data stored on the disk is retrieved by the database engine, and information used immediately is tuned based on the data hierarchy ;for example being stored as a cache in memory.

Main Data : Saved in disc

Snap shot : nothing

Cache: Save in Memory

In-memory database

In-memory database: A large amount of database data that stores all data on the main memory

Data increases rapidly → Data hierarchy tuning cannot keep up.
→In-memory database "that does not need to consider data hierarchy has appeared

Main data: Save to memory

Snapshot: Save to SSD every & regularly

cache: None

Data management method　RDB(Relational Data Base)

RDB: The database management method that manages complex data relationships by managing data as multiple tables and defining the relationship between tables.

RDB Data model

The RDB data model has a row store and a column store, and column store is the default setting in HANA.

Raw store (row storage): Stores one record as one row of data

Column store (column type storage): Stores data in a bundled form for each column

* Column store may be defined as a type of NoSQL (short for Not only SQL)

When data is stored with the above-mentioned delimiters, the data is stored in the database in the following state.

Image

Row store

Column store

When reading data from above, it will be read in the order shown above, so it has the following characteristics

Characteristics of both data mode

	Row store	Column store
Add / update / delete processing	rapid	slow
Aggregation processing	slow	rapid

HANA is a column database, and data is compressed and stored in the column direction.

HANA adopts a mechanism called “delta buffer” to speed up the process of adding, updating, and deleting.

Reference:Delta buffer

The delta buffer is concept for column storage database. It consists of three parts: L1 delta, L2 delta, and main store that is a column type database.

L1: Addition, Update, and Delete information for Database is stored as raw store without being compressed in the raw store

L2: L1 which is a row store in the background is converted to L2 which is a column store and it is compressed by an index

Main store: After acquiring and storing data from L2, it is compressed and relocated for high-speed access.

*Quote from「Efﬁcient Transaction Processing in SAP HANA Database – The End of a Column Store Myth」

Main : Uses of the Row store and Column store

As mentioned above, information is basically stored in the column store in HANA.
However, when defining a table, you can choose between two methods: row store and column store.
Based on the characteristics of both stores, the following information is suitable for handling.

・Nature of information handled

Table properties suitable for Row stores	Table properties suitable for column stores
• that contain mainly distinct values low compression rate• in which most/all columns are relevant• that are not subject to aggregation or search operations on non-indexed columns• that are fully buffered that have a small number of records	• that are subject to column operations on a large number of rows • that have a large number of columns, more unused • that are subject to aggregations and intensive search operations

Refer：OpenSAP

Actual usage example of row store and column store

Research method: SE16N (general table query) refers to the technical settings of the DD09L table and searches by narrowing down the Storage Type.

Result

Of the 205 standard SAP tables, 203 were defined in column stores, one in row stores, and one undefined.

About Row Store

By the way, TBTCO (Job Status Summary Table) was the only SAP standard table defined in the low store.

In this table, logs such as the date when the job was registered and the date when it was executed were saved as shown below.

JOBNAME

JOBCOUNT

JOBGROUP

INTREPORT

STEPCOUNT

SDLSTRTDT

SDLSTRTTM

BTCSYSTEM

SDLDATE

SDLTIME

SDLUNAME

LASTCHDATE

LASTCHTIME

LASTCHNAME

RELDATE

SAP_WORKFLOW_ACTION

1232400

%NEWSTEP

1

2019/11/19

1:27:30

2019/11/19

1:23:24

DDIC

2019/1/23

10:32:19

TDC

2019/11/19

As a result of referring to tables other than the SAP standard table, the following information was found as a table using the row table.

Information for access

Access history / transmission / reception log

Dictionary / Definition

Information associated with GUID only for unique serial number (GUID)

Order variant mapping) / SAPAPO / ORDMAP (Order Mapping Table)

(Description / text) DDTYPET

Configuration information / setting information

The following is a summary of the characteristics of these tables:

There is no value to be calculated

ID is unique for information

Information that does not change

Relatively few rows

Relatively few columns

This was also true for the nature of the information that I mentioned earlier, but when I looked at the table, I was able to imagine it more specifically.

About Column Store

Here are some examples from the SAP standard table for the column store.
Variant information (VARI): GUID does not exist and there are many records(about 4000 records)

Cl.	Relation ID	Program name	Cariant name	counter	date	time	ver	SYST data element BIN2	etc.
100	VB	R_NAME	&0000000123456	0		00:00:00		158	0
100	VB	R_NAME	&0000000123457	0		00:00:00		158	0
100	VB	R_NAME	&0000000123458	0		00:00:00		158	0

BKPF：
Accounting slip header data: Not subject to aggregation, but there is no GUID and the number of records is large

VBAP：Sales slip: Item data: It is subject to aggregation. There is no GUID and there are many records.

Summary

When creating a table, basically use the column store. Understand the nature of the table that should be set in the low store, and using it properly will increase the access speed.

Table properties suitable for low stores	Table properties suitable for column stores
Mainly contains unique values Maximum / all columns are related Tables that are not subject to search or aggregation on non-indexed columns Completely buffered • Small number of records Less changed information	It is the target of column operations for a large number of rows There are a large number of columns and they are not frequently used The table is aggregated in the search operation The number of records is large

Thank you for reading

Best Regard,

Rena Takahashi

Japanese ver.

この記事のまとめ

HANAはカラム型データベース

テーブルのほとんどがカラム型

ロー型のテーブルの特徴は下記である

計算対象となる値がない

情報に対してIDが一意である

変更しない情報

行数が比較的少ない

列数が比較的少ない

この記事の目標

カラムストアとローストアを具体的にイメージし、正しく使い分けできるようになる。

私が抱いた疑問

HANA DBは「インメモリデータベース」かつ「カラム型データベース」という構成をとっている。

テーブルのストレージタイプはローストアとカラムストアの２種類を選択できる。

保存されている情報を用いて頻繁に計算する場合はカラムストア、

そうでない場合はローストアとするのが基本ですが、

具体的にどう活用されているのか、私は疑問に思いました。

まず、その疑問を解決するための前提知識として、HANAのデータベース構造をそのほかの一般的なデータベースと対比します。その次に、ローストア、カラムストアの違いを説明し、具体的な使用方法を実際のテーブルを上げて解説していきます。

データベースについて

CPUに近い場所にあるメモリ：高速にデータ通信が可能/容量小さい/揮発性

CPUと離れた場所にあるメモリ：アクセス速度は相対的に遅い/容量大きい/不揮発性

➡データを置く位置によってアクセスの速度が異なる（データヒエラルキーがある。）

データベースの構造　オンディスクデータベースとインメモリデータベース

オンディスクデータベース

オンディスクデータベース：HDDなどのディスクストレージにデータを格納するデータベース

ディスクに保存されたデータをデータベースエンジンによって取り出し、すぐに使う情報はメモリにキャッシュとして保存する等データヒエラルキーに基づいてチューニングを行う。

メインデータ：ディスクに保存

スナップショット：なし

キャッシュ：メモリに保存

インメモリデータベース

インメモリデータベース：データを全てメインメモリ上に格納するデータベース

データが大量かつ急速に増える→データヒエラルキーのチューニングが追い付かない。

→データヒエラルキーを考える必要がない「インメモリデータベース」が登場

メインデータ：メモリに保存

スナップショット：SSDに更新毎＆定期的に保存

キャッシュ：なし

データの管理方法　RDB（リレーショナルデータベース）

RDB：データを複数の表として管理し、表と表の関係を定義することで、複雑なデータの関連性を扱えるデータベースの管理方式

・RDBのデータモデル

RDBのデータモデルにはローストアとカラムストアがあり、HANAはカラムストアが初期設定となっています。

ローストア（行型ストレージ）：１つのレコードを１行のデータとみなして格納する

カラムストア（列型ストレージ）：データを列ごとにまとまった形で格納する

※カラムストアはNoSQL（Not only SQLの略）の一種として定義されることもある

以上のような区切りでデータを保存すると、データベース上には下記のような状態でデータが保持される。

（イメージ）

ローストア

カラムストア

データを上から読み込むときは上記のような順序で読み込まれるため、下記のような特徴になります

・両データモデルの特徴

	ローストア	カラムストア
追加・更新・削除処理	はやい	おそい
集計処理	おそい	はやい

HANAはカラム型データベースであり、データは列方向に圧縮され保持されています。

追加・更新・削除といった処理を速くするためにHANAは「デルタバッファ」という仕組みを採用している。

参考　デルタバッファ

デルタバッファはカラムストアのデータベースにおけるコンセプトです。

デルタバッファはL1デルタ、L2デルタ、カラム型データベースであるメインストアの3つで構成されています

L1：データベースに対する追加、更新、削除の情報がローストアで圧縮されずに保存される

L2：バックグラウンドでローストアであるL1がカラムストアであるL2に変換され、インデックスにより圧縮される

メインストア：L2からデータを取得し保存した上で、高速アクセスのために圧縮、配置変更が行われます。

「Efﬁcient Transaction Processing in SAP HANA Database – The End of a Column Store Myth」から引用

本題　ローストアとカラムストアの用途とは

以上のように、HANAにおいて情報は基本的にカラムストアで保存されています。

しかし、テーブルを定義する際には、ローストアとカラムストアの２種類の方法のどちらかを選択することができます。

両ストアの特徴を踏まえると、扱う情報としては下記のようなものが向いています。

・扱う情報の性質

ローストアに適したテーブルの性質	カラムストアに適したテーブルの性質
• 主に固有な値が含まれる→低圧縮率 • 最大/すべての列が関連している • インデックスのないカラムに対する検索、あるいは、集計の対象ではないテーブル • 完全にバッファ • レコード数が少ない	• 多数の行に対する列操作の対象 • 多数の列があり、使用不使用が多いテーブル • テーブルが検索操作に集約している

参考：OpenSAP

ローストアとカラムストアの実際の使用例

調査の方法：SE16N（一般テーブル照会）からDD09Lテーブルの技術設定を参照し、Storage Typeを絞って検索する

結果

SAP標準テーブル205個のうち、203個がカラムストアで定義されており、１つがローストア、１つが未定義となっていました。

ローストアについて

ちなみに、SAP標準テーブルのうち、ローストアで定義されていたのは、TBTCO(ジョブステータス概要テーブル)だけでした。このテーブルには下記のようにジョブが登録された日付や実行された日付等のログが保存されていました。

JOBNAME

JOBCOUNT

JOBGROUP

INTREPORT

STEPCOUNT

SDLSTRTDT

SDLSTRTTM

BTCSYSTEM

SDLDATE

SDLTIME

SDLUNAME

LASTCHDATE

LASTCHTIME

LASTCHNAME

RELDATE

SAP_WORKFLOW_ACTION

1232400

%NEWSTEP

1

2019/11/19

1:27:30

2019/11/19

1:23:24

DDIC

2019/1/23

10:32:19

TDC

2019/11/19

バックグラウンドジョブ名

バッチジョブ ID 番号

Job Group

内部レポート名

ジョブステップ ID 番号

開始日付

実行日

Background ジョブの対象システム

登録日

予定日

ジョブスケジューラ

ジョブ最終変更

ジョブ最終変更者

登録済のリリース

SAP_WORKFLOW_ACTION

01232400

%NEWSTEP

1

2019/11/19

1:27:30

2019/11/19

1:23:24

DDIC

2019/01/23

10:32:19

TDC

2019/11/19

SAP標準テーブルの以外のテーブルも参照した結果、ローテーブルが使われているテーブルとして下記のような情報がありました。

アクセスのための情報

アクセス履歴・送受信ログ

辞書・定義

一意の連番（GUID）のみ

GUIDに紐づく情報

順序

バリアント

マッピング )/SAPAPO/ORDMAP（Order Mapping Table）

説明文・テキスト)DDTYPET

構成情報・設定情報

これらのテーブルの特徴をまとめると、下記が挙げられると思います。

計算対象となる値がない

情報に対してIDが一意である

変更しない情報

行数が比較的少ない

列数が比較的少ない

先に挙げた扱う情報の性質にも当てはまっていましたが、実際にテーブルを見ると、より具体的にイメージできました。

カラムストアについて

SAP標準テーブルからいくつかのテーブルを具体例として挙げます。

バリアント情報（VARI）：GUIDは存在せず、レコード数が多い（4000）

Cl.	関係 ID	バリアントキーでのプログラム名	バリアント名	カウンタ	バリアントの日付	バリアントの時刻	バージョン番号	SYST のデータエレメント BIN2	内容
100	VB	R_NAME	&0000000123456	0		00:00:00		158	0
100	VB	R_NAME	&0000000123457	0		00:00:00		158	0
100	VB	R_NAME	&0000000123458	0		00:00:00		158	0

BKPF：会計伝票ヘッダデータ：集計対象ではないが、GUIDは存在せず、レコード数が多い

VBAP：販売伝票: 明細データ：集計対象となっている。GUIDは存在せず、レコード数が多い、

まとめ

テーブルを作成するときは、基本的にカラムストアを使う。

ローストアに設定するべきテーブルの性質を理解して、適切に使うことアクセス速度が上がる。

ローストアに適したテーブルの性質	カラムストアに適したテーブルの性質
• 主に固有な値が含まれる • 最大/すべての列が関連している • インデックスのないカラムに対する検索、あるいは、集計の対象ではないテーブル • 完全にバッファ • レコード数が少ない •あまり変更されない情報	• 多数の行に対する列操作の対象になっている • 多数の列があり、使用不使用が多い • テーブルが検索操作に集約している •レコード数が多い

ここまでお読みくださりありがとうございます。

高橋　伶奈

How to Define "Row" and "Column" Store

At first ...

Let's relearn with me !

Points

Effort

Question;I thought

Database

Database stracture

On-disc database and In-memory database

On-disk database

In-memory database

Data management method　RDB(Relational Data Base)

RDB Data model

Image

Characteristics of both data mode

Reference:Delta buffer

Main : Uses of the Row store and Column store

・Nature of information handled

Actual usage example of row store and column store

About Row Store

About Column Store

Summary

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised

How to Define "Row" and "Column" Store

At first ...

Let's relearn with me !

Points

Effort

Question;I thought

Database

Database stracture

On-disc database and In-memory database

On-disk database

In-memory database

Data management method RDB(Relational Data Base)

RDB Data model

Image

Characteristics of both data mode

Reference:Delta buffer

Main : Uses of the Row store and Column store

・Nature of information handled

Actual usage example of row store and column store

About Row Store

About Column Store

Summary

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised

Data management method　RDB(Relational Data Base)