Handling Deleted Records from Flat File as Source to InfoProvider
Applies to: SAP BI 7.X
Normally we use flat file loads to SAP BW loading either to DSO and/or info Provider with an assumption that the file sent is always with the delta changes. Otherwise, we will drop and reload the DSO/Info Providers every time we receive the files. These approaches are fine if the file size is reasonably small which does not take lot of the processing time to complete the data loads. In case, we have huge volume of data and have additional upward layers to feed the same data, this approach is time consuming and process centric. Here we are going to discuss the design option to overcome data issues in the file such as nullified/deleted records in the source that need to be deleted in the upward Info Providers as well.
Author(s): Anil Kumar Puranam, Peesu Sudhakar and Sougata Singha.
Created on: 27th January, 2014
Sudhakar Reddy Peesu is an SAP Technology professional with more than 15 years of experience leading SAP BW/ABAP implementations, developing custom applications, and developing IT strategy. He is a PMP certified manager, has led BW/ABAP teams with hardware and landscape planning, design, installation and support, and development teams through all phases of the SDLC using the onsite/offshore delivery model. He has strong communication, project management, and organization skills. His industry project experience includes Banking, Consumer/Products, and Public Sector industries
Anil Kumar. Puranam is working as a Senior BW/BI Developer in Deloitte consulting. He has more than 9 years of SAP BW/BI/BO experience. He has worked on various support/implementation projects while working in consulting companies like Deloitte, IBM India, and TCS.
Sougata Singha is working as a business technology analyst in Deloitte Consulting. He has 2 years of experience in SAP BW/BI/BO and has been working on different support and implementation projects
Generally, in many of the BW implementations, ECC will be the source for transactional data loads. However during the transition phase to SAP (moving the operational system from legacy to ECC application), there is a lot of importance to the data extraction from legacy systems through flat files. In phase of moving our applications from legacy to ECC systems, we need to extract historical as well as day-to-day transactions to BW through flat files. Most of the time, we need to bring the data from legacy applications to BW as full loads until and unless the source applications are capable to generate deltas. When the tools are not capable to provide the deltas as like as ECC systems, we build the DSO layer as the staging layer. The DSO that we use in the staging layer help us to generate the after images and before images which generates delta to the next layer. But at the same time a single DSO alone is not capable/ sufficient to identify the deleted records from the source files that we receive.
Below are common two designs that we implement:
1. Full load from flat file to Info Provider and upwards to other Info Providers for various reporting
a. Easy to implement
b. Assume the file received are delta records from the source
c. Drop and reload if the legacy system is not capable to send the delta files
d. Time consuming to drop and re-load especially, when the file size is huge
e. Additional load process and time if the base Info Provider feeds upward Reporting layer cubes
f. If the data is feeding to external systems, the other system should also need to have some additional mechanism/design for taking care of data feeds
2. Flat file load first to staging DSO and upward Info Provider layers –
a. Relatively easy to implement
b. This design process delta or full file load
c. Only delta records are pushed to the upward Info Provider layers
d. This data flow helps to generate the delta records from staging layer DSO to next layer(s).
e. Assumption is that the delta/full file syndicated to BW will be with the zeroed out records (for deleted ones, zero will be posted rather than deleting).
The below side-by-side data flow picture illustrates above two design approaches/methods:
Problem with Traditional Design:
When we implement the 2nd design data flow discussed above, here is one major problem when there is some data records deleted in the source legacy system and part of the next file extract. As we are not dropping data and reloading the data at the staging layer DSO, it will be unable to detect those deleted records and they will remain in the staging DSO and upward layers. This will be pulled into the reports and display inaccurate data and lead to wrong decision making.
In order to understand the business scenario better, the below example is discussed with some sample data and behavior of delta mechanism.
Note: Sample data provided is only for requirement understanding; it does not contain any data from the client (Actual Data).
Here in the second load, we can see one of the records is missing. But this record will be still available in the DSO ZDSO_STG and further in the Info Providers. In the below table we can see the delta records generated in the change log of ZDSO_STG.
In order to delete the records from the DSO/Info Providers and to represent the correct values in reports, one of the traditional options that we have is to complete drop and reload of flat file data. But this approach is time consuming task depending on the size of the data file (records) that we need to drop and reload and the number of data targets that we need to process. Sometimes the drop and reload cannot be done in all the Info providers depending on the applications build on top of the Info providers. In order to address the above technical challenges, below step by step approach is followed.
Approach to overcome the Challenge:
In order to overcome the above challenge following steps need to be implemented,
Step 1: Parallel DSO with the same transformation but with one additional Info Object
Introduce another DSO ZDSO_DEL between PSA and the current DSO (i.e. between PSA and ZDSO_STG). Here DSO ZDSO_DEL is exactly same as like as ZDSO_STG, but with one extra info object ZTIMESTMP in the data field. Below is the definition of this IO –
Step 2: Transformation between the newly introduced DSO and regular DSO
Create transformation from new DSO ZDSO_DEL to ZDSO_STG. Here we can see 1:1 mapping between source and target, except ZTIMESTMP of source. ZTIMESTMP of source is not mapped to any of the field of target DSO.
Step 3: Create a custom control table to store the timestamp values
Create (or use any existing custom table) a z- table to store the value of ZTIMESTMP to be inserted in all the records of one data load to ZDSO_DEL. So here all the data packages of the load will be same value available in ZDSO_DEL. Here I have created a table ZTAB_CTRL.
Step 4: ABAP program to insert/modify the load date and time into the control table
Create a small program to update an entry in table ZTAB_CTRL with NAME, say, “ZTIMESTAMP_VAL”, which is the concatenation of system Date and Time. Check the appendix A for the sample code. Here is the sample record after updating the table:
Step 5: End Routine program in the transformation of newly introduced DSO
Create an end routine in the transformation of ZDSO_DEL from PSA (the transformation created in step 1), for updating ZTIMESTMP Info object for all the records of ZDSO_DEL. In the end routine, pass the value of above table entry to ZTIMESTMP.
Here is the code that we need to use in the end routine.
Step 6: Transformation between the two DSO’s and ABAP logic to mark Record Mode to ‘R’
Create the start routine in transformations from ZDSO_DEL to ZDSO_STG (the transformation created in step 2). Here in the start routine, check the value of ZTIMESTMP with the value of table entry in table ZTAB_CTRL. If the value is not same in the record, we need to make record mode equal to “R”.
We need to make filter on ZTIMESTMP in the DTP by using few lines ABAP code to fetch the records which are not equal to the value in the table ZTAB_CTRL. But still we need to make record mode equal to “R” for all these records. Please see Appendix B for the code to be used in DTP.
Here is the code that we need to use in the start routine.
Step 7: Process Chain design for automation
Create a process chain as shown below. Here the first step will be ABAP program to update the entry in table ZTAB_CTRL with NAME = “ZTIMESTAMP_VAL”, which will be equal to concatenation of system Date and system Time.
Note: We need to do full load from PSA to new DSO ZDSO_DEL each time and also to ZDSO_STG. We need to do full load from this new DSO (ZDSO_DEL) to existing DSO (ZDSO_STG). But here we take the records which do not have timestamp equal to the entry the table ZTAB_CTRL.
Data in the targets after each load:
Here no change to 4th record of ZDSO_STG even though it is not available in Source files. First record got overwritten with new value 200 from 100. And also 3rd record got overwritten with new value 100 from 300.And no change to 2nd record.
Transaction Data in ZDSO_DEL after the Second load:
Here assume that load was happened at around 20131115010101.
Transaction Data in ZDSO_STG after moving the data from ZDSO_DEL:
If we trigger the load from ZDSO_DEL to ZDSO_STG, we do get 4th record from ZDSO_DEL and this record will be moved with record mode ‘R’ to ZDSO_STG.
The deleted record is deleted from the target and it would be taken care as delta to upper layer as well.
Advantage/Disadvantage of new design:
1) Deletion is properly handled in the staging layer DSO and delta will be generated for the deleted records to upper layer.
2) Development effort is less. New DSO structure is same as old DSO. So structure can be copied while developing the new DSO.
3) Routines written in start routine, end routine and DTP filter is small and easy to write.
4) No further control table maintenance is required.
1) We are keeping same set of data in 2 DSO. So memory is a concern.
2) If we expect more than 50% data is deleted in each load, then ZDSO_DEL ZDSO_STG load will take much time and in that case drop and reload is a better option
*this program will update the Table ZTAB_CTRL-
*LOW field value to timestamp Value for Insight Elim.
PARAMETERS ZNAME type ZTAB_CTRL-NAME .
DATA: lv_timestamp(20) TYPE /bic/oiztimestmp.
CONCATENATE sy-datlo sy-timlo INTO lv_timestamp.
DATA wa_tab TYPE ZTAB_CTRL.
wa_tab-name = ZNAME.
wa_tab-value = lv_timestamp.
modify ZTAB_CTRL from wa_tab.
IF SY-SUBRC = 0.
COMMIT WORK AND WAIT.
// DTP filter on TIMESTAMP between ZDSO_DEL & ZDSO_STG.
DATA: s_ZTAB_CTRL TYPE ZTAB_CTRL.
DATA temp TYPE /bic/oiztimestmp.
SELECT SINGLE * INTO s_ZTAB_CTRL FROM ZTAB_CTRL
WHERE name = ‘ZTIMESTAMP_VAL’.
IF sy-subrc = 0.
temp = s_ZTAB_CTRL-VALUE.
l_t_range-iobjnm = ‘ZTIMESTMP’.
l_t_range-fieldname = ‘/BIC/ZTIMESTMP’.
l_t_range-sign = ‘E’.
l_t_range-option = ‘EQ’.
l_t_range-low = temp.