Data Warehousing Terminology – “Jump Start”
Anyone will tell you, today’s technological environment is full of Three Letter Acronyms (TLA), not to be confused with, plain initials (e.g. TLA).
“What’s the difference?” I hear you say, well an acronym can be pronounced. It also does not have to be a real word. That’s why SAP is an acronym, HANA is an acronym, and TLA is not.
Working in the SAP environment for many years, I see the inevitable increase in the number of acronyms and initials, as new technology comes on board. Some obvious, some guessable, some doubling up with others, etc… you get the picture.
So, I started out by creating this list just for myself. To summarize, articulate and gather in a single location, the various definitions. It was then adopted by our Sales Team as a “Jump Start”, at which point I thought about putting it here.
It does contain some SAP specific details such as transaction codes and screen shots, however, they should not be required knowledge.
It is not alphabetically ordered at this stage so I suggest performing a <CTRL>+F to find something.
I will also put supporting links where I find them useful.
I know Wiki is out there and very useful, so this document is not intended to replace or duplicate it by any means. In places, it may not be complete. (I welcome help)
This document also relies on some basic SAP knowledge. It is not a “dummies guide to SAP”.
Due to the technological nature, some explanations use terminology further explained within this document. So you may find yourself looking up other terms. Otherwise, I would be explaining the same terms more than once, in various locations.
This is a work in progress and likely to grow as I find time to add new items. Please get in touch with me if you find errors, or have some new input. No doubt, not everyone will agree with my definitions and I’m probably getting myself into a bashing.
But for now here it is, and let’s see where it goes…
BW for HANA (B4H, B4/HANA, BWFH, B4)
SAP’s latest BW offering (2016) that runs purely on HANA. Not to be confused with BW on HANA. In B4H modeling is performed using Eclipse (not RSA1). Classic DSOs and Cubes do not exist in this version.
BW on HANA (BWoH)
aka – BW powered by (SAP) HANA, SAP BW offering running on HANA. Unlike B4H, this version can be run on other databases, thus enabling the transition from anyDB to HANA without major effort. Some functionality is tuned and optimized for HANA, e.g. Activations, open ODS views. Classic DSOs and Cubes exist.
Data Store Object (DSO)
BW Object in which data can be loaded into (persisted). Typically used for Transaction Data. In B4H this object is replaced by the advanced DSO (ADSO).
Advanced Data Store Object (ADSO)
BW Object in which data can be loaded into. Typically used for Transaction Data. Optimized further from the classic DSO.
Archive vs NLS
“Archive” within this context, refers to offline data storage. I.e. Data is no longer considered within the Application’s database and is required to be reloaded if needed.
Most of the information surrounding this topic revolves around the movement of data from one location to another. E.g. RAM to DISK to FILE (online to nearline to offline)
Archiving stores the data offline, often securely and outside the Application. Therefore, the Application cannot directly access the data. Whereas, NLS the data is stored near-line and is readable by the Application.
- Data currently in the Application is online
- Archiving moves the data out of the Application
Data Retention Tool (DART)
SAP archiving into sequential files. Designed for meeting legal requirements for tax auditing purposes and includes tools for viewing. Archives are performed on SAP Financial documents, including master data for a given fiscal period.
Archive Development Kit (ADK)
A set of program interfaces (APIs) enabling file-based archive solutions. Including SAP’s own Data Archive Process (DAP) as well as other 3rd Party solutions. ADK uses Archive Objects and integrated into NetWeaver, so available in BW and ECC.
Queries have read access to the Archive files without data having to be reloaded into BW. Not all Queries support reading. E.g. cumulative key figures
DAP setting for an InfoProvider in BW
Information Lifecycle Management (ILM)
Manages the lifecycle of an application’s data, from creation to destruction using retention rules (see Retention Management – RM). ILM is necessary to be legally compliant and Extends the classic SAP Archiving (Tx SARA) with Retention Management (end-of-life of Data), and Retention Warehouse (end-of-life of System).
- ILM “Store” – Promoting SAP IQ as storage for ILM
- WebDAV is also an ILM Store
- ILM Retention Warehouse (BW + ILM Store)
- ILM Includes Destroy
Retention Management – Amount of Time (age) and Location where data is securely kept. Functionality for putting holds on data for legal cases.
- Archive and Destruction of data
Retention Warehouse (RW)
Focuses on end-of-life and decommissioning of a SAP system. Includes methods for the removal of data, and related metadata, from the old system and storing in a Retention Warehouse. The archived data and metadata can then be loaded into a BW environment for reporting and auditing.
- RW is a SAP Application
- BW BI Content available
Business Warehouse (BW) vs Business Intelligence (BI)
SAP Product BW was named BI after a certain time, and named back to BW, when SAP purchased Business Objects and decided to call it BI. When using either term, make sure it is understood the product you are referring to.
ILM compliant storage of archive files and indexes on a separate database. Can be within or external to the SAP environment dependent on the technology.
SAP promotes IQ as ILM store within the SAP environment, and without the use of external interfaces, as an alternative to WebDAV.
Web-based Distributed Authoring and Versioning is a technology which extends the HTTP protocol by enabling direct write access to content. Using WebDAV, data can be directly accessed on remote servers. WebDAV can also be used as a data store for ILM.
BW on HANA(BWoH) and BW/HANA(B4H), introduces a new HANA node (extension node) used specifically for Warm data. Relaxed sizing promotes more storage on disk than available RAM.
Lends itself to less critical SLA for data, e.g. PSAs, Change Logs, being moved to the Extension node and thus freeing memory on the main node for the more critical data Hot data.
This is BW specific, and not native HANA.
Extension nodes permit twice the normal amount of data, than an normal HANA node. Flushing from disk to memory is to be expected, unlike a normal HANA spec’d system.
Dynamic Tiering (DT)
HANA Dynamic Tiering adds disk based, extended storage for column tables. Referencing to multi-temperature data management, Warm data is located on extended storage (DT Server’s disk), and Hot data remains in memory (HANA’s RAM) thus reducing the in-memory footprint.
- Application (e.g. BW) or Native HANA
- Requires a dedicated server (worker) for production use
- Local HANA disk is not used
- DT Tables are Columnar
- Tables CANNOT be partitioned, if so, partitions will be dropped DT1.0
- Tables CAN be partition DT2.0 (multi-store concept)
- HANA & DT share a common database
- Data does not have to be loaded into Hot store when accessed, unlike BW non-Active
- Query operations can be pushed down to DT Server
- Not available for BW Cubes
- Not all SQL statements supported on extended tables
- Management, License, Security, Backup & Restore, HA & DR implications
- Tx SE38 RS_DYNTIER_CHANGE to also alter temperature profile
- Installed as a Delivery Unit for native HANA use
- BW Standard operations supported
- BW Object property settings for DT
Write Optimised DSO (wo-DSO) setting in BW
Advance DSO (ADSO) setting eclipse
Data Source(PSA) setting in BW
B4H also supports Extension Nodes (EN), so do not confuse the two. It is possible two have both DT and EN solutions on the same BW system.
Data movement using DT with native HANA is executed via the DWF-DLM. For BW, BW itself operates the movement of data.
The mechanics for reading and writing to the various disk stores differ. Each store has it’s own specific query engine and an optimizer determines where the query is executed.
Near Line Storage (NLS)
Used to archive data without having to reload back into Application when needed. Also offers some OLAP reporting capabilities. E.g. BEx Query access
DAP setting for an InfoProvider in BW
Data Volume Management (DVM)
Comprehensive Solution Manager tool which enables the monitoring of data across multiple systems. Includes dashboards, detail reports, what-if, archiving jobs, etc.
Data Archive Processes (DAP)
DAP can be considered SAPs archiving solution for BW. It is technically a BW artifact in which various archiving configurations can be made for a specific Cube or DSO (not ADSO). Including the use of ADK or NLS solutions or both, depending upon the archive requirements
- Tx RSDAP – Edit DAP
- 1 DAP per SPO i.e. Not SPO part
SAP Classic Archiving
SAPs classic and mature NetWeaver Archiving functionality based around “Archive Objects” which are delivered as standard and can be enhanced or created from new.
Typically, a standard Archive Object would be utilized in a particular archive process. This Object would determine the various read, write, and re-load programs used, along with many other settings.
- AOBJ – Archive Objects
- SARA – Archive Administration
- SARJ – Archive Retrieval
- SARI – Archive Information
ILM vs DLM
ILM is more interested in the semantics of the data and its metadata. DLM is less interested in the use of information and more in the physical location.
Data Lifecycle Manager (DLM)
HANA tool in the Data Warehousing Foundation (DWF) providing the functionality to move the HANA persistent data (on disk) to various storage solutions e.g. Dynamic Tiering, SAP IQ, Hadoop, Destruction Bin.
Profiles are used to configure which tables(data), storage destination, direction, and rules for moving the data, either manually or scheduled. Technology Destinations: Dynamic Tiering, SAP IQ over SDA, Hadoop
SAP IQ (IQ)
A database system which shares a similar columnar paradigm as HANA, however, on disk
Used with an NLS archiving strategy
IQ can be integrated into BW as an archiving destination.
- Can be used as a data store for ILM
- Can be implemented with DLM
Smart Data Access (SDA)
Technology enabling HANA access to remote data without replication, using ODBC. Connects to SAP IQ, Hadoop, HANA systems, and more. Recently enhanced using Smart Data Integration (SDI).
In BW on HANA, Open Operational Data Source Views (ODS Views) use SDA, as does BW’s NLS solution. Data remains in source and is not moved.
Using HTTP, SDA can connect to data sources outside firewalls.
- “Virtualization” of Data
Smart Data Integration (SDI)
Various native HANA provisioning, functionality and technologies, supporting Batch, Real-time, Streaming, and virtualization. Numerous Data Sources are supported, including SAP Business Suite, Cloud Services, Hadoop, IQ, Wireless devices, and more.
The architecture can deploy cloud and premise connectivity and benefits from being processed by HANA (in-memory, parallelization). It is also supported with a developers kit, enabling the build of custom adapters to other Data Sources.
If the data provisioning requirements are not too complex, SDI can remove the need for external tools such as Business Objects Data Services (BODS), SAP Landscape Technology (SLT), Event Stream Processor (ESP), etc.
Includes a Data Provisioning Agent that hosts extra adapters (google+, outlook, file, etc). Also SDI contains an SDK for building custom adapters.
Installed using a Delivery Unit, and Agent software
Smart Data Quality (SDQ)
Native HANA functionality that can be performed on inbound Data. E.g. Duplicate Records, Address checking.
Smart Data Streaming (SDS)
Native HANA functionality that enables inbound real-time data streaming, with filter, and aggregation functionality, prior to persistence.
Event Stream Processing (ESP)
Predecessor to SDS
Complex Event Processing (CEP)
Used in Event Stream Processing (ESP) and Smart Data Streaming (SDS)
Data Warehouse Foundation (DWF)
Provides DDO and DLM tools for HANA
Distributed Data Optimizer (DDO)
Tool in the Data Warehousing Foundation (DWF) providing the functionality to analyze and evenly distribute in memory data across nodes in a distributed/clustered environment.
Data Ageing (DA)
Similar to BW’s non-active data disk placement functionality, DA is Suite on HANA & S/4 HANA’s version for disk data placement. The business data management concept is based around DA Objects which can be enhanced or newly created. Aged data is stored on disk and remains accessible by the Application without reloading.
DA is used to reduce the in-memory data footprint for NetWeaver Applications. Data is spoken as current (hot) or historical (cold). Historical data is moved into its own partition and out of memory onto disk. Historic data is determined by the Application using data retention rules.
A Data Ageing Run performed as a background job prepares the records to be age.
- Not to be confused with “Data Aging” in BW
- Available from NetWeaver 7.4 SP5 (or SP8)
- Aged data can be modified
- Not just on HANA dbs
BW Data Aging
In BW Data Aging is also a term for BW archiving.
Tx RSA1->Administration->Housekeeping tasks
“Mass Maintenance of DAPs” (Data Archive Processing) is the “Data Aging” transaction seen above. Numerous DAP BW objects can be configured simultaneously.
In the context of HANA, Hadoop Distributed File System (HDFS) stores data across multiple machines.
Hot data is read/written frequently. In memory (Hot Store). No restrictions.
Warm data is read/written less frequently. On disk (Warm Store), in memory when needed. No restrictions.
Cold data is not written to and rarely read. Not in HANA and restricted by NLS.
BW – non-Active
BW functionality providing a strategy for managing the displacement (unloading from memory) of data when memory bottlenecks occur. Tables are prioritized as non-active for “early” unload. E.g. Priority 7. Priority 0 is lowest, meaning “do not unload”
For example, Classic Write Optimized DSOs and Data Sources (PSA) = 7. Classic DSO Active and Activation Tables = 5.
Similar to Suite on HANA & S4/HANA’s DA data disk placement functionality, Non-Active is BW’s version for data placement to disk.
- Can be customized in theory, this is not recommended by SAP
- Priority is set in SYS.TABLES-UNLOAD_PRIORITY. Values 0-9
- Data is required to be loaded into memory when accessed
- Hot data has a Priority of 0
- Partition tables are enabled by the Main table object