Skip to Content

A novel way to do selective deletion and not impact your change logs.

In many cases we have issues where we want to selectively delete records from DSO and want them to be deleted from the change log as well..

One interesting approach would be to consider using BW load processes to achieve the same.

Lets take an example :
Lets say we have ODS1 which gets loaded daily with Sales Data. Some of the records here are records for internal sales which have to be deleted.There is a cube which is being used for reporting which gets loaded daily from this DSO.

Approach 1 :
The usual way is to have a selective deletion on the DSO and then do a full load to the cube .

Process Steps :

  1. Load data to ODS1
  2. Activate data in ODS1
  3. Selectively delete data in ODS1
  4. Drop data in Cube
  5. Load data into cube
  6. Build the aggregates

Pros :

  1. Simple method
  2. Easy to implement

Cons :

  1. Repeated load of data to cube
  2. Aggregates have to be rebuilt every day
  3. High database activity
  4. Becomes self defeating with large data volumes
  5. With large data volumes – Selective deletion takes longer and longer times

Approach 2 :

Here you can look at selectively deleting the data from the DSO and then  selectively deleting from the cube as well and reloading the new data  alone using selective full loads from the DSO to cube

Process Steps

  1. Load data into ODS1
  2. Activate data in ODS1
  3. Selectively delete data in ODS1
  4. Selectively delete data in cube for affected time period / Delete Overlapping request
  5. Load selective full updates from ODS1 to cube
  6. Rebuild the aggregates

Pros :

  1. Slightly complex but still easy to achieve
  2. All can be done using Process chains and some simple ABAP in the infopackage

Cons :

  1. Aggregates have to be rebuilt every day
  2. High database activity
  3. Maintenance and validation heavy
  4. Gets tougher to manage load times with higher data volumes

Approach 3:
Use the BW load process and some ABAP programs to negate the values.

Process Steps

  1. Load data to ODS1
  2. Activate the data in ODS1
  3. Create a copy of ODS1 as a transaction ODS (ODS2)
  4. Have a program run in the process chain that identifies these records to be deleted and writes them to the active table of ODS2
  5. In the program set all the key figures to zero
  6. Load ODS1 from ODS2 using a full update
  7. Use the 0recordmode in the update to delete the records
  8. Continue deltas into the cube
  9. Drop data from ODS2

Pros :

  1. BW load processes are used and no deletion takes place
  2. Change log integrity is preserved
  3. Incremental building of aggregates can be done
  4. One time validation is required post which the whole process can be automated.
  5. This can be automated using a process chain

 Cons :

  1. Requires an ABAP program
  2. Time to implement is slightly longer than the previous approaches
  3. Initial validation is required to make sure that only selected records are deleted and no additional records get deleted

A data flow for the same is available below.

Data Flow

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

We had done this for some really large DSOs and the process worked fine and the time taken was much lesser than Approach 1 or Appproach 2. Initiallly we did think that the ABAP program was going to take too much time and that it was not worth the effort but then some efficient ABAp programming and some big time failures in Approach 1 and 2 convinced us to go down Approach 3!!!

To report this post you need to login first.

22 Comments

You must be Logged on to comment or reply to a post.

  1. Rajeev Kumar
    Why not do a self loop load on ODS1 which can be run with required selectin and has routines for for not so straight forward scenario and sets the record mode to D.
    (0) 
    1. Arun Varadarajan Post author
      The main reason was that when we did the self loop – We had to handle the changelog accordingly and had to further validate the deltas to the cube since there would be a self loop in addition to the previous load … hence we ended up having approach 3 where the whole operation was performed outside…

      But then self load as approach 4 is possible..

      (0) 
  2. satish murthy
    Hi Arun,
    very informative blog.
    some doubts on the 3 approaches:
    1)In approach 1 & 2 is it assumed that the data is being loaded to cube from the active table and not change log?
    Because during a selective deletion from DSO, Data is deleted from active table but change log still has the data.

    2) In the 3rd approach, You did mention that this approach is taken to maintain the integrity of change log. So,are you loading the cube from change log here?.
    Please clarify.

    (0) 
    1. Arun Varadarajan Post author
      In the third approach – you need not so any deletion in the ODS / CUbe and you can load the cube from the change log and you would not have to rebuild the aggregates but would have to just roll them up…
      (0) 
    2. Arun Varadarajan Post author
      And in Approach 1 and 2 we would typically do selective full loads from the active table into the cube… or complete full loads depending on the data volumes…
      (0) 
    3. Arun Varadarajan Post author
      Satish,
      For approach 1 and 2 – we would have to do full loads since records in the cube have to be changed. We will be loading from the active table..
      The example I have quoted from the previous reponses mention the issue..
      Day 1 I get soaps as 100 and Day 2 I do not get soaps since soaps have been sold out – the same record has to be removed from the cube because it is stock against day 1 and day 2 stock has to be zero. If I use simple delta loading then the cube would still say soaps as 100 but in reality soaps is zero.

      Warm regards,
      Arun

      (0) 
  3. Bala Prabahar
    “Lets say we have ODS1 which gets loaded daily with Sales Data. Some of the records here are records for internal sales which have to be deleted.”

    Is it difficult to just pull what is needed (all records minus internal sales) from the source system?

    I also would like to know your thoughts on Bill Inmon’s definition for Data Warehouse: “A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process”.
    Non-volatile means: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

    Thanks,
    Bala

    (0) 
    1. Arun Varadarajan Post author
      Bala,
      The example I had taken would not be indicative of the actual scenario…

      In our case we had stock coming in from the warehouses via flat file. But then this was a full load for the month and it did not support deltas. A typical example was like :
      Day 1 | Soap | 100
      Day 1 | Detergent | 100
      And in Day 2 – they sell off all the soaps then my extract would be :
      Day 2 | Detergent | 100
      and any further GRs…
      This meant that I have to delete the data received on Day 1. The data is stable – the only thing is that we have to make adjustments to the way we receive and process the data. In the case I have given in this comment – you cannot pull specific records. This scenario also occurs for many sources of data and is quite common too.

      Thanks,
      Arun

      (0) 
      1. Bala Prabahar
        Arun,

        I am confused. May be we can discuss this when we meet some day:).
        DW/BW data should be consistent with the source system. What would source system say about the product Soap for Day 2. Here are possible answers:
           – Value zero.
           – Product soap not found
           – Any other?
        If the source system shows the value Zero(this is more likely) for Day2 for soap, then that information can be easily included in the flat file.
        If the source system displays “Product soap not found” for day 2, then that’s exactly what BW would show by just loading actual data. But for day 1, DW would accurately show 100 if the record was not deleted. And this is a fact. If the record was deleted for Day 1, then aren’t we losing very important information(fact) that the customer sold 100 soaps on day 1?

        Thanks,
        Bala

        (0) 
        1. Arun Varadarajan Post author
          Bala,
          Its slightly more complicated than that…
          The Inventory is at a Month level and not at a day level. We will get the fact that they sold 100 units of soap in the sales data but then handling the stock information in the way prescribed is much easier than tying up sales , GRs and Inventory. If it was an ERP which had all aspects integrated then we would get information like what SAP provides .. but then in many cases – warehouses managing inventories and the sales process is not always linked together in a seamless way. Also the target audiences are different for each. The intent of the user is to know the inventory for the month and this should show SOAP as 0 or not show soap at all ( if you are managing 1000’s of SKUs )and this restatement / selective deletion approach resolves this issue. The system should show soap as not available in stock for day 2 ( since the report is at a monthly level )

          Arun

          (0) 
  4. Srinivas Neelam
    Arun,

    For this requirement why we need to delete data from DSO..?
    We can simply restrict data at cube transformation before moving to CUBE.

    Or,
    While loading from flat to DSO we can make record mode as “D” to delete records from DSO.

    Thank YOu.

    (0) 
    1. Arun Varadarajan Post author
      Srinivas,
      The whole issue is that I do not get the changed records from the flat file…
      Day 1 extract
      Soaps | 100
      detergent | 200
      Day 2 Extract
      Detergent | 300

      Soaps have got sold out and this record is not available in the extract. For this I would have to delete records from the cube / negate them within BW since this is not available in the extract.

      (0) 
  5. Alok Kashyap
    Suppose Cube is taking regular Delta from DSO (Change log table) via DTP. If we do selective deletion from cube then what should we need to do next? Should we do the delta or Full from DSO??

    Let me say some example: suppose we have corrected 10 documents in DSO and same want to forward it to the cube? then we have to delete (Selective deletion) those 10 records from the cube and do the full load?? or in fact no need to do selective deletion at all,,just do the Delta DTP to Cube

    (0) 
    1. Witalij Rudnicki
      Hi Alok. It depends how you corrected these 10 docs in DSO. If you loaded these 10 corrected docs into DSO, then you should have 20 new records in Change Log and you load them with Delta DTP.
      (0) 
      1. Alok Kashyap
        thanks for the reply. yes I have corrected DSO by reloading 10 documents to it. So the conclusion is if we do delta from DSO to Cube then never do selevtive deletion from cube!! otherwise delta would be corrupted!! Any comment? or let me know the scenarios where this is possible
        (0) 
  6. SM S
    Hi Arun,

    Please clarify what would be the pseudo code for the ABAP, since I am confused how you determine the records to be deleted which you point out in step 4 below.  So after activation of data in ODS1, you run a program to identify records to be deleted and then write to ODS2 (direct update DSO).  As a concept it seems fine, but how do you work out the records to be deleted?

    I have used a process where I load to DSO then I run an ABAP program to compare new data table and active data table.  But I do not think this is what you mean – please expand.

    Also I guess the same can be achieved by re-posting ODS1 to itself and multiplying the key figures by -1 – this obviously has to be done before data is loaded to the ODS1.

    Thanks, SMS

    YOUR CODE (point 4 giving confusion)
    ———-
    Process Steps

       1. Load data to ODS1
       2. Activate the data in ODS1
       3. Create a copy of ODS1 as a transaction ODS (ODS2)
       4. Have a program run in the process chain that identifies these records to be deleted and writes them to the active table of ODS2
       5. In the program set all the key figures to zero
       6. Load ODS1 from ODS2 using a full update
       7. Use the 0recordmode in the update to delete the records
       8. Continue deltas into the cube
       9. Drop data from ODS2

    (0) 
    1. Arun Varadarajan Post author
      Sulman,
      In my case the records were as such :
      Day 1 :
      Jan | Cust 1 | 100
      Jan | Cust 2 | 100

      And the data on day2 was :
      Jan| Cust 2 | 100

      The reason being that the source system ( in this case non SAP ) did not capture changes and the data we needed in the DSO was only
      Jan | Cust2 | 100

      In this case I used the ODS to write the records and then compare and then reverse any records that I did not need.

      Your approach of using the new data table is also very good and can be used.. The pseudo code for the same would be like what I described above…

      I had left Step 4 as a generic option because the selective deletion step can be anything the person wishes to have – common scenario is to have a date field having the date of load as a non key value and comparing records to be deleted using this field..

      (0) 
      1. SM S
        Arun,

        Many thanks for prompt reply.

        You said ‘
        In this case I used the ODS to write the records and then compare and then reverse any records that I did not need.’ — So when you load ODS1 and activate, you then copy active table of ODS1 to ODS2? Then you would compare active table of ODS1 and ODS2.  Or maybe before Activation again you could compare new data table of ODS1 and active table of ODS2? — please clarify as your solution may be good for me.

        Many thanks
        SMS

        (0) 
        1. Arun Varadarajan Post author
          In my ODS I have a date field
          Once loaded the records would be :
          Day 1| Cust1 | 100 | Day1
          Day 2 | Cust2 | 100 | day 2
          The requirement was to nullify or delete all the records which had Day1 – this could have been easy with selective deletion but with 4 such DSOs and about 1 Million records getting loaded every day and a cube with about 3 years data above and relevant aggregates – we decided to go for this approach and saved a bunch of time by avoiding selective deletions , BWA rollups , Aggregates, master data locks due to selective deletion / aggregate rollup etc etc and the data was available much faster for reporting.

          My program was a simple one which ran for a particular month and identified the records to be deleted in the active table- pretty straightforward. This was done post activation – since I was using only the active table here

          Or an even simpler option would be to have a program that runs a compare command using Native SQL and writes the differences to a table and you could nullify the same – in this case you use the DB to do your work for you …

          (0) 
          1. SM S
            Understood –

            Thanks for taking the time to help and for providing another method – much appreciated.

            Best Regards, SMS

            (0) 

Leave a Reply