SAP NetWeaver 7.0 BI: new DataStore ” Write-Optimized DSO”
In a database system, read operations are much more common than write operations and consequently, most of database systems have been read optimized. As the size of the main memory increases, more of the database read requests will be satisfied from the buffer system and also the number of disk write operations when compared to total disk operations will relatively increase. This feature has turned the focus on write optimized database systems.
In SAP Business Warehouse, it is necessary to activate the data loaded into a Data Store object to make it visible for reporting or to update it to further InfoProviders. As of SAP NetWeaver 2004, a new type of Data Store object was introduced: the Write-Optimized DataStore object. The objective of this new DataStore is to save data as efficiently as possible to further process it without any activation, additional effort of generating SIDs, aggregation and data-record based delta. This is a staging DataStore used for a faster upload.
In BI 7.0, three types of DataStore objects exist:
- Standard DataStore (Regular ODS).
- DataStore Object for Direct Updates ( APD ODS).
- Write-Optimized DataStore (new).
In this weblog, I would like to focus on the features, usage and the advantages of Write-Optimzied DataStore.
Write-Optimized DSO has been primarily designed to be the initial staging of the source system data from where the data could be transferred to the Standard DSO or the InfoCube.
o The data is saved in the write-optimized Data Store object quickly. Data is stored in at most granular form. Document headers and items are extracted using a DataSource and stored in the DataStore.
o The data is then immediately written to the further data targets in the architected data mart layer for optimized multidimensional analysis.
The key benefit of using write-optimized DataStore object is that the data is immediately available for further processing in active version. YOU SAVE ACTIVATION TIME across the landscape. The system does not generate SIDs for write-optimized DataStore objects to achive faster upload. Reporting is also possible on the basis of these DataStore objects. However, SAP recommends to use Write-Optimized DataStore as a EDW inbound layer, and update the data into further targets such as standard DataStore objects or InfoCubes.
Fast EDW inbound layer – An Introduction
Data warehousing has been developed into an advanced and complex technology. For some time it was assumed that it is sufficient to store data in a star schema optimized for reporting. However, this does not adequately meet the needs of consistency and flexibility in the long run. Therefore data warehouses are structured using layer architecture like Enterprise data warehouse layer and Architectured data mart layer. These different layers contain data at different levels of granularity as shown in Figure 1.
Figure 1 Enterprise Data Warehouse Layer is a corporate information repository
The benefit of Enterprise Data warehouse Layer includes the following:
Reliability, Trace back – Prevent Silos
o ‘Single point of truth’.
o All data have to pass this layer on it’s path from the source to the summarized EDW managed data marts.Controlled Extraction and Data staging (transformations, cleansing)
o Data are extracted only once and deployed many.
o Merging data that are commonly used together.
Flexibility, Reusability and Completeness.
o The data is not manipulated to please specific project scopes (unflavored).
o The coverage of unexpected adhoc requirements.
o The data is not aggregated.
o Normally not used for reporting, used for staging, cleansing and transformation one time.
o Old versions like document status are not overwritten or changed but useful information may be added.
o Historical completeness – different levels of completeness are possible from availability of latest version with change date to change history of all versions including extraction history.
o Modeled using Write-Optimized DataStore or standard DataStore.
o Data is integrated.
o Realization of the corporate data integration strategy.
Architectured data marts are used for analysis reporting layer, aggregated data, data manipulation with business logic, and can be modeled using InfoCubes or Multi Cubes.
When is it recommended to use Write-Optimized DataStore
Here are the Scenarios for Write-Optimized DataStore. (As shown in Figure 2).
o Fast EDW inbound layer.
o SAP recommends Write-Optimized DSO to be used as the first layer. It is called Enterprise Data Warehouse layer. As not all business content come with this DSO layer, you may need to build your own. You may check in table RSDODSO for version D and type “Write-Optimized”.
o There is always the need for faster data load. DSOs can be configured to be Write optimized. Thus, the data load happens faster and the load window is shorter.
o Used where fast loads are essential. Example: multiple loads per day (or) short source system access times (world wide system landscapes).
o If the DataSource is not delta enabled. In this case, you would want to have a Write-Optimized DataStore to be the first stage in BI and then pull the Delta request to a cube.
o Write-optimized DataStore object is used as a temporary storage area for large sets of data when executing complex transformations for this data before it is written to the DataStore object. Subsequently, the data can be updated to further InfoProviders. You only have to create the complex transformations once for all incoming data.
o Write-optimized DataStore objects can be the staging layer for saving data. Business rules are only applied when the data is updated to additional InfoProviders.
o If you want to retain history at request level. In this case you may not need to have PSA archive; instead you can use Write-Optimized DataStore.
o If a multi dimensional analysis is not required and you want to have operational reports, you might want to use Write Optimized DataStore first, and then feed data into Standard Datastore.
o Probably you can use it for preliminary landing space for your incoming data from diffrent sources.
o If you want to report daily refresh data with out activation.In this case it can be used in reporting layer with InfoSet (or) MultiProvider.
I have discussed possible scenarios but request you to decide where this data store can fit in your data flow.
Typical Data Flow using Write-Optimized DataStore
Figure 2 Typical Data flow using write-optimized DataStore.
Functionality of Write-Optimized DataStore (As shown in Figure 3).
Only active data table (DSO key: request ID, Packet No, and Record No):
o No change log table and no activation queue.
o Size of the DataStore is maintainable.
o Technical key is unique.
o Every record has a new technical key, only inserts.
o Data is stored at request level like PSA table.
No SID generation:
o Reporting is possible(but you need make sure performance is optimized )
o BEx Reporting is switched off.
o Can be included in InfoSet or Multiprovider.
o Performence improvement during dataload.
Fully integrated in data flow:
o Used as data source and data target
o Export into info providers via request delta
Uniqueness of Data:
o Checkbox “Do not check Uniqueness of data”.
o If this indicator is set, the active table of the DataStore object could contain several records with the same key.
Allows parallel load.
Can be included in Process chain with out activation step.
You cannot use reclustering for write-optimized DataStore objects since this DataStore data is not meant for querying. You can only use reclustering for standard DataStore objects and the DataStore objects for direct update.
PSA and Write optimized DSO are the two different entities in the data flow as each one has its own features and usage. Write optimized DSO will not replace the PSA in a data flow but it allows to stage (or) store the data without activation and to apply business rules.
Write-optimized DataStore Object is automatically partitioned. Manual Partitioning can be done according to SAP Notes 565725/742243. Optimized Write performance has been achieved by request level insertions, similarly like F table in InfoCube. As we are aware that F fact table is write-optimized while the E fact table is read optimized.
Figure 3 Overview of various DataStore objects types in BI 7.0
To define Write-Optimized DataStore, just change “Type of DataStore Object” to “Write-Optimized” as shown in Figure 4.
Figure 4 Technical settings for Write-Optimized DataStore.
Understanding Write-Optimized DataStore keys:
Since data is written into Write-optimized DataStore active-table directly, you may not need to activate the request as is necessary with the standard DataStore object. The loaded data is not aggregated; the history of the data is retained at request level. . If two data records with the same logical key are extracted from the source, both records are saved in the DataStore object. The record mode responsible for aggregation remains, however, the aggregation of data can take place later in standard DataStore objects.
The system generates a unique technical key for the write-optimized DataStore object. The technical key consists of the Request GUID field (0REQUEST), the Data Package field (0DATAPAKID) and the Data Record Number field (0RECORD) as shown in Figure4. Only new data records are loaded to this key.
The standard key fields are not necessary with this type of DataStore object. Also you can define Write-Optimized DataStore without standard key. If standard key fields exist anyway, they are called semantic keys so that they can be distinguished from the technical key.
Semantic Keys can be defined as primary keys in further target Data Store but it depends on requirement. For example if you are loading data into ScheduleLine Level ODS thru Write-optimized DSO, you can have header,item,scl as the semantic keys in your Write-optimized DSO. The purpose of the semantic key is to identify error in the incoming records or duplicate records. All subsequent data records with same key are written to error stack along with the incorrect data records. These are not updated to data targets; these are updated to error stack. A maximum of 16 key fields and 749 data fields are permitted. Semantic Keys protect the data quality. Semantic keys won’t appear in database level. In order to process error records or duplicate records, you must have to define Semantic group in DTP (data transfer process) that is used to define a key for evaluation as shown in Figure 5. If you assume that there are no incoming duplicates or error records, there is no need to define semantic group, it’s not mandatory.
The semantic key determines which records should be detained when processing. For example, if you define “order number” and “item” as the key, if you have one erroneous record with an order number 123456 item 7, then any other records received in that same request or subsequent requests with order number 123456 item 7 will also be detained. This is applicable for duplicate records as well.
Figure 5 Semantic group in data transfer process.
Semantic key definition integrates the write-optimized DataStore and the error stack through the semantic group in DTP as shown in Figure 5. With SAP NetWeaver 2004s BI SPS10, the write-optimized DataStore object is fully connected to the DTP error stack function.
If you want to use write-optimized DataStore object in BEx queries, it is recommend that you define semantic key and that you run a check to ensure that the data is unique. In this case, the write-optimized DataStore object behaves like a standard DataStore object. If the DataStore object does not have these properties, unexpected results may be produced when the data is aggregated in the query.
Data that is loaded into Write-Optimized Data Store objects is available immediately for further processing. The activation step that has been necessary up to now is no longer required. Note here that the loaded data is not aggregated. If two data records with the same logical key are extracted from the source, both records are saved in the Data Store object, since the technical key for the both records not unique. The record mode (InfoObject 0RECORDMODE (space,X,A,D,R)) responsible for aggregation remains, however, the aggregation of data can take place at a later time in standard Data Store objects (or) InfoCube. Write-Optimized DataStore does not support the image based delta(RECORDMODE), it supports request level delta, and you will get brand new delta request for each data load. When you load a DataStore object that is optimized for writing, the delta administration is supplied with the change log request and not the load request.
Since write-optimized DataStore objects do not have a change log, the system does not create delta (in the sense of a before image and an after image). When you update data into the connected InfoProviders, the system only updates the requests that have not yet been posted.
Write-Optimized Data Store supports request level delta. In order to capture before and after image delta, you must have to post latest request into further targets like Standard DataStore or Infocubes.
Extraction method – Transformations thru DTP (or) Update Rules thru InfoSource
Prior to using DTP, you must have to migrate 3.x DataSource into BI 7.0 DataSource by using transaction code “RSDS” as shown in Figure 6.
Figure 6 Migration of 3.x Data Source -> Data Source using Tcode “RSDS”, and then replicate the data source into BI 7.0.
After data source replication into BI 7.0, you may have to create data transfer process (DTP) to load data into Write-Optimized DataStore. Write-optimized DataStore objects can force a check of the semantic key for uniqueness when data is stored. If this option is active and if duplicate records are loaded with regard to semantic key, these are logged in the error stack of the Data Transfer Protocol (DTP) for further evaluation.
In BI7 you are having the option to create error DTP. If any error occurs in data, the error data will be stored in Error stack. So, you can correct the errors in stack, and if you schedule the error DTP, the error data will be stored to target. Otherwise, you have to delete the error request from target and you need to reschedule the DTP. In order to integrate Write-Optimized DataStore into Error stack, you must have to define semantic keys in DataStore definition and create semantic group in DTP as shown in Figure 5.
Semantic group definition is necessary to do parallel loads to Write-Optimized DataStore. You can update write-optimized DataStore objects in parallel after you have implemented OSS 1007769 note. When you include a DTP in process chain for write-optimized DataStore Object, you will need to make sure that there is no subsequent activation step for this DataStore.
On the other hand you can just link this DSO thru the Infosource with update rules as well by using 3.x functionality.
Reporting Write-Optimized DataStore Data:
For performance reasons, SID values are not created for the characteristics that are loaded. The data is still available for BEx queries. However, in comparison to standard DataStore objects, you can expect slightly worse performance because the SID values have to be created during reporting. However, it is recommended that you use them as a staging layer, and update the data to standard DataStore objects or InfoCubes.
OLAP BEx query perspective, there is no big difference between Write-Optimized DataStore and Standard DataStore, the technical key is not visible for reporting, so the look and feel is just like regular DataStore. If you want to use write-optimized DataStore object in BEx queries, it is recommended that they have a semantic key and that you run a check to ensure that the data is unique. In this case, the write-optimized DataStore object behaves like a standard DataStore object. If the DataStore object does not have these properties, unexpected results may be produced when the data is aggregated in the query.
In a nut shell, Write Optimized DSO is not for reporting purpose unless otherwise required to do so, it’s a staging DataStore used for faster upload. The direct reporting on this object is also possible without activation but keeping in mind the performance, you can use an infoset or multi-provider.
Using Write-Optimized DataStore, you will have snapshot for each extraction. This data can be used for trending old KPIs or deriving new KPIs at any time because the data is stored at request level. This most granular level data by calendar day/time can be used for slice and dice, data mining, root-cause analysis, behavioral analysis which will help in better decision making. Moreover you need not worry about the status of extracted documents into BI since data is stored as of extracted date/time. For example Order-to-Cash/Spend analysis…etc life cycle can be monitored in detail to identify the bottlenecks in the process.
Although there is help documentation available from SAP on Write-Optimzied DataStore, I thought it would be useful to write this blog that gives a clear view on Write-Optimized DataStore concept, the typical scenarios of where, when and how to use; you can customize the data flow/ data model as per reporting(or)downstream requirement. A more detailed step-by-step technical document will be released soon.
Useful OSS notes:
Please check the latest OSS notes / support packages from SAP to overcome any technical difficulties occurred and make sure to implement them.
OSS 1077308: In a write-optimized DataStore object, 0FISCVARNT is treated as a key, even though it is only a semantic key.
OSS 1007769: Parallel updating in write-optimized DataStore objects
OSS 1128082 – P17:DSO:DTP:Write-optimized DSO and parallel DTP loading
OSS 966002: Integration of write-opt DataStore in DTP error stack
OSS 1054065: Archiving supports.
You can attend SAP class DBW70E – BI Delta Enterprise Data Warehousing SAP NetWeaver 2004s. Or you can visit http://www50.sap.com/useducation/
SAP Help documentation
New BI Capabilities in SAP NetWeaver2004s (Pls open in separate link)
Enterprise Data Warehousing – SAP BI (Pls open in separate link)
SAP NetWeaver 7.0 –Business Intelligence Warehouse Management (Pls open in separate link)