Monitoring the performance of our BW system, we noticed a slow but steady degradation of queries response time. Further analysis revealed that the degradation was caused by the increasing amount of data stored in the system. We therefore decided to launch a data archiving project.
In the first phase of the project, we decided to target a set of 22 InfoProviders (11 ODS and 11 cubes). The data stored in these InfoProviders comes from SAP ECC (FI-GL data) and several non-SAP operational systems.
Objective of this blog
The objective of this blog is to share some key points, learning and interesting facts and findings about our data archiving project. The objective of this blog is not to cover the whole subject as a lot of information about Data Archiving and Information Life Management (ILM) can be found on SDN.
The first step was to define the methodology for the project. We originally got that from SAP at http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/2064e305-3826-2a10-03a0-e0a6b06b4081 ; unfortunately, the PowerPoint slide shown below is not in the document anymore but you can still find it at http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/a111ae90-0201-0010-4bbe-809ec5627433 (although not exactly in the same format). We slightly adapted the approach to suit our BI environment. We found that this gave us a good framework to work with all the parties involved (end-users, technical teams, etc.).
Collect the business requirements
One of the first steps is to collect the business requirements during the Data Analysis phase (see above slide). Basically, end-users need to define what data needs to be accessed, what are the legal requirements that affect data retention (if any), etc.
This may seem straightforward but it is actually not always simple to reach a decision! End-users may have divergent points of view that need to be reconciled or they may want to keep more data than really needed. To get the end-users realistic, it is useful to show them the volume of data stored in each InfoProvider and give them a sense of what is reasonable and what is not (one of the objectives being to improve query response time!).
We also listed all the queries executed during the last few months on the various InfoProviders in scope and went through the list with the end-users (this is not as painful as one may think). We had doubts that some queries would still work after the data is archived. This required further analysis and we ended up modifying some queries. We also found a few queries that should have been developed on other InfoProviders in the first place and migrated these queries to the appropriate InfoProvider.
Decide what to do with the data
Part of the discussion is to decide what to do with the data. Should it be simply deleted? Copied to archive cubes? Etc. We found the slide below quite useful to drive the discussion with the end-users (also available at http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/2064e305-3826-2a10-03a0-e0a6b06b4081)
Note that you may have more options available with BI 7 (e.g. NLS). This was not an option in our case because we currently operate BW 3.5. Another scenario that doesn’t appear explicitly in the above diagram is to transfer data into “archive” cubes. This can easily be included in the above decision tree.
In our case, the requirements were simply to delete the data. However, we decided to archive the data i.e. write the data to archive files and then delete the data from the InfoProvider. The main reason was to be able to reload the data from the archive files in case we deleted the wrong data. An additional benefit is that there is a standard and robust solution provided by SAP to just do that (SARA).
Data archiving frequency
One important point is to decide the data archiving frequency i.e. should we archive the data every month? Quarter? Year? In the end, we decided to go for a monthly archiving process for the following reasons:
1) This keeps the database size smaller
Most of our requirements were to keep N “rolling” months of data in the InfoProviders. By archiving one month of data every month (instead, for example, of archiving one year of data every year), you keep the size of the database smaller.
2) This makes the archiving process less risky
Because you archive only one month of data at a time (versus one year of data for example), there is less chance of encountering various problems (not enough temporary disk space, not enough time to archive all the data over a week-end, etc.). The constraint is that you need to make sure that your archiving processes are robust and industrialised because you don’t want to spend time every month fiddling around.
3) This ensures that data archiving is an accepted process
If you archive data on a monthly basis, everybody (end-users, technical team, etc.) are used to it. This is an accepted process. On the other hand, if you archive data on a yearly basis, it is quite easy to forget to archive the data! You could give the opportunity to the end-users to question the process, especially if some of the end-users who participated in the original project are not present anymore. And you could end up doing your archiving project again!
Basically, we setup archiving objects using SARA. Once this was done, we created one process chain for each archiving session to run. Each process chain does the following:
- Control that the parameters (dates) used to select the data to archive/suppress are consistent with the data retention period of the InfoProvider (this is done by reading a setup table as explained below)
- Take a summary snapshot of the data (before image)
- Deactivate the aggregates
- Generate the archive file by calling the Write Program generated by SARA
- Suppress the data by calling the Delete Program generated by SARA (actually this is done by calling the RSARCHD program with the appropriate parameter)
- Take a summary snapshot of the data (after image)
- Compare before and after image and return an error if the result is not coherent
- Activate the aggregates
- Update the setup table with the new date parameters to use in the next archiving session (advance by one month)
The setup table (mentioned in steps 1 and 9) indicates, for each archiving session, the data retention period of the InfoProvider (e.g. 24 months) and the dates to use the next time the archiving session is executed (e.g. archive/suppress data for April 2008).
Because we don’t need to keep the archive files, they’re simply deleted on a regular basis using an automated job running at operating system level.
Overall, this process may seem a bit heavy to implement as it requires some ABAP coding but, as said before, we wanted to have a robust and industrialised process that runs smoothly and that can be deployed easily and quickly for other InfoProviders.
Deactivate the aggregates
We found it more efficient to deactivate (and later rebuild) the aggregates before archiving/suppressing the data because aggregates can really slow down the whole process. Because each BW environment is unique in terms of number and size of aggregates, the best is to do some testing before deciding to deactivate (or not) the aggregates.
Set parameter BW_SELDEL_PERC_THRES in table RSADMIN
If the data to suppress from an InfoProvider doesn’t exceed 10% of the overall volume, BW directly deletes the data from the InfoProvider tables. If the data to suppress exceeds 10% of the overall volume, all the data, except the data to be suppressed, is copied to new tables. Old tables are deleted and the new tables are renamed. This could drastically affect the time it takes to suppress the data.
This 10% default value can be changed by setting the parameter BW_SELDEL_PERC_THRES in table RSADMIN (using program SAP_RSADMIN_MAINTAIN). Choosing the right value depends on your environment and may require some testing.
Get some external expertise
We found it quite useful to bring in some external expertise on SARA, even for just a few days, and learned much from the consultants.