Skip to Content
Technical Articles
Author's profile photo Felipe de Mello Rodrigues

Delete duplicate entries in ABAP CDS views using Table Function and SQL Window Function

With the Code Push-Down strategy introduced by SAP HANA some of the functionalities used by ABAP developers haven’t been easily translated to the newest technologies, for example, CDS views.

In this blog post we are going to explore one of this scenarios trying to delete duplicate entries using ABAP CDS Views.

As I demonstrated before in one of my previous articles, ABAP CDS Views don’t offer full flexibility for all the development scenarios and an alternative solution for the exceptions is to adapt our logic using ABAP CDS Table Functions. If you still didn’t check this content I advise to have a look in the following article before you proceed this reading.

Controlling the granularity of your data sets and deleting duplicate records is quite a difficult task in ABAP CDS development because the use of GROUP BY is not enough to solve this requirement in all of the cases (e.g. join with composite keys). For these particular scenarios there is a simple solution based on ABAP CDS Table Function and one of the SAP HANA SQL Window Functions called RANK.

Have a look in the following links if you want to absorb more content related with these topics.

Useful Links:

 

Study case

Let’s create a simple example and compare both approaches discussed so far:

  • ABAP CDS view with GROUP BY
  • ABAP CDS table function with RANK

Create a simple table called ZDEMOCDSRANK based on the following structure:

Populate some data inside this table, you can use the following entries as example:

The requirement is quite simple, we need to extract the latest Reference Code for each Customer based on the latest Reference Date.

Notice the Reference Codes are not following the same order as the Reference Dates, if we try to solve this requirement with GROUP BY and MAX( ) the output wouldn’t be correct because of the different order between both fields.

First, let’s try an ABAP CDS view based on this erroneous approach and analyse the output.

 

ABAP CDS view with GROUP BY

@AbapCatalog.sqlViewName: 'ZCDSGROUPBY'
@AbapCatalog.compiler.compareFilter: true
@AccessControl.authorizationCheck: #CHECK
@EndUserText.label: 'Group By (Demo)'

define view ZCDS_GROUPBY 
  as select from zdemocdsrank
{
    key customer     as Customer, 
    
        max(refcode) as ReferenceCode, 
    
        max(refdate) as ReferenceDate
}
group by
  customer

Run the Data Preview and check the result provided by this CDS view.

As explained before the result is incorrect because the strategy with GROUP BY doesn’t maintain the integrity of the row. MAX( ) brings the highest values for each one of the columns creating some inconsistencies between ReferenceCode and ReferenceDate.

In this case, the approach with Table Function and RANK( ) can easily solve our requirement, let’s check the second approach now.

 

ABAP CDS Table Function

@EndUserText.label: 'Rank (Demo)'

define table function ZCDS_RANK
returns {
  Client        : abap.clnt;
  Customer      : kunnr;
  ReferenceCode : abap.char(10);
  ReferenceDate : abap.dats;
}
implemented by method zcl_cds_rank=>exec_method;

AMDP Class

CLASS zcl_cds_rank DEFINITION
  PUBLIC
  FINAL
  CREATE PUBLIC .

  PUBLIC SECTION.
    INTERFACES if_amdp_marker_hdb.
    CLASS-METHODS exec_method FOR TABLE FUNCTION ZCDS_RANK.

  PROTECTED SECTION.
  PRIVATE SECTION.
ENDCLASS.

CLASS zcl_cds_rank IMPLEMENTATION.

  METHOD exec_method
    BY DATABASE FUNCTION FOR HDB
    LANGUAGE SQLSCRIPT
    OPTIONS READ-ONLY
    USING zdemocdsrank.

    rank_table =
      SELECT mandt    AS Client,
             customer AS Customer,
             refcode  AS ReferenceCode,
             refdate  AS ReferenceDate,
             RANK ( ) OVER ( PARTITION BY mandt, customer
                                 ORDER BY refdate DESC ) AS rank
        FROM zdemocdsrank;

    RETURN
      SELECT Client,
             Customer,
             ReferenceCode,
             ReferenceDate
        FROM :rank_table
       WHERE rank = 1;

  ENDMETHOD.

ENDCLASS.

Run the Data Preview again and this time observe the correct output.

 

Important note:

The function RANK( ) creates a numeric field at the end of the selection list holding the position of the record based on the specified PARTITION and ORDER:

  • Partition by Client and Customer;
  • Order by Reference Date descending.

This logic creates a ranking for each group of Customers including the latest records on the top, this is the result generated by the first SELECT statement in our example:

Client Customer ReferenceCode ReferenceDate rank
10 1000000001 1205 08.10.2018 2
10 1000000001 2110 21.11.2018 1
10 1000000001 3281 01.09.2018 3
10 1000000001 6503 19.08.2018 4
10 1000000002 40290 19.12.2018 1
10 1000000002 53210 07.11.2018 2
10 1000000002 90001 16.09.2018 3

The second SELECT removes the duplicate entries looking only for records with the rank position no. 1, in other words, the most up to date record available for each Customer.

 

As usual, I hope you enjoyed the content and I see you next time. 🙂

Assigned Tags

      22 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Pawan Kalyan K
      Pawan Kalyan K

      Hello,

      Excellent blog... Very useful and you are educating us with these amazing features...Thanks Keep on posting...!!!

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Hi Pawan,

      Thank you for the feedback. 🙂

      I will definitely keep posting new content, my next series of articles about SAP Cloud Platform is coming soon (probably in January), please stay tuned.

      Cheers,

      Felipe

      Author's profile photo Sara jain
      Sara jain

      Very Useful Info..Thanks For Sharing!

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Thanks Sara!

      Author's profile photo Deodutt Dwivedi
      Deodutt Dwivedi

      Nice blog. I think alternatively this can be done by doing a max on date group by customer in a separate CDS view and then inner join with the original CDS view based on Customer & Date field. This will avoid AMDP.

      Regards,

      Deo

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Hi Deodutt Dwivedi,

      This approach you mentioned is actually the only way to solve without the AMDP.

      For simple scenarios (like in my example) we can achieve the solution with two views only, but when we increase the complexity of the requirement it is quite difficult to follow this approach.

      Imagine, for example, if I had not only the Reference Date but also the Reference Time to check.

      Using the ABAP Table Function I would need only to include an extra field in the ORDER BY clause:

      RANK ( ) OVER ( PARTITION BY mandt, customer
                          ORDER BY refdate DESC,
                                   reftime DESC ) AS rank

      How would be the solution with ABAP CDS views in this case?

      This is still a pretty simple requirement with one extra field to check but I think it exemplifies how the complexity raises exponentially.

      Cheers,

      Felipe

      Author's profile photo Deodutt Dwivedi
      Deodutt Dwivedi

      Hi Felipe,

      Yes, AMDP is good. Just the overhead of creating additional objects for consuming in CDS view is what bothers me. Even few years back on earlier version of S/4 1511 NW 7.5 00, i think float to decimal type casting was not supported in ABAP CDS views, but this was available in AMDP and table functions came to our rescue. Once again thanks for sharing this.

      Regards,

      Deo

      Author's profile photo venu gopal
      venu gopal

      Very nice blog, you have explained in good way.

      Author's profile photo Andre Schüßler
      Andre Schüßler

      Hi Felipe,

      thank for sharing your knowledge.

      In the times before CDS was invented, I used simple OSQL subqueries for tasks like this one.

       

      You can just do the same with the following OSQL.

       

      SELECT * FROM zdemocdsrank AS demo INTO TABLE gt_demo
      WHERE refdate = ( SELECT MAX( refdate ) FROM zdemocdsrank WHERE customer = demo~customer ).

      It has the same Output as your code.

      In simple cases I would prefer OSQL.

       

      Regards,

       

      Andre

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Hi Andre Schüßler,

      In the ABAP layer I usually work with DELETE ADJACENT DUPLICATES after the SELECT and to be honest this was the inspiration for this blog. Unfortunately, we don't have this option available anymore since we need program in the DB level.

      I hope one day ABAP CDS views provide enough flexibility to use commands like the one you mentioned but until that time comes we will need to use these alternative techniques.

      Cheers,

      Felipe

      Author's profile photo Diganto Goswami
      Diganto Goswami

      Another solution that I can think without using Rank can be.

      X = Select customer, max(ref_date) from zdemocdsrank group by customer;

       

      Zcdsrank = select customer, ref_code, ref_date from zdemocdsrank as Z inner join X on Z~customer = X~customer and Z~ref_date = X~refdate;

      This way we can avoid using Rank() and also avoid remembering complex syntactic constructs.

      Author's profile photo Vinay Joshi
      Vinay Joshi

      Very Helpful. Thanks. Keep on posting such knowledgeable blogs.

      Author's profile photo sathish B
      sathish B

      Hi Felipe,

       

      The Blog you shared is very nice.

      I have a question here if reference date is same for the two documents
      I mean

      Client Customer ReferenceCode ReferenceDate rank
      10 1000000001 1205 08.10.2018 2
      10 1000000001 2110 21.11.2018 1
      10 1000000001 2110 21.11.2018 1
      10 1000000001 6503 19.08.2018 4

       

      How would we handle this situation ? which record will be fetched ? only 1 record or 2 records ?

      Best Regards,

      Sathish.

       

       

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Hi sathish B,

      Excellent question!

      In this case both of the entries with the RANK = 1 are selected.

      You need to be careful if you need to return a single entry in the return, in this scenario you should include more fields in the ORDER BY clause from your PARTITON.

      Check the following reply for more insights:

      https://blogs.sap.com/2018/11/30/delete-duplicate-entries-in-abap-cds-views-using-table-function-and-sql-window-function/comment-page-1/#comment-444825

      Cheers,

      Felipe

      Author's profile photo mihir parkar
      mihir parkar

      Hi Felipe,

      Thanks for posting.

      I was lokking for the exact thing for a project I'm working in.

      Keep sharing.

      Regards,

      Mihir

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Excellent Mihir Parkar! Thanks for the message. 🙂

      Author's profile photo subhrangsu bagchi
      subhrangsu bagchi

      An extremely valuable blog. Thanks a lot Felipe.

      I was trying to figure out how to identify the latest Exchange Rates from the Exchange Rate Table that also contain historical data. With no access to ABAP Programs, and only CDS since we are generating reports used via Analysis for Office, this blog was invaluable for us. Thanks again.

      Author's profile photo Felipe de Mello Rodrigues
      Felipe de Mello Rodrigues
      Blog Post Author

      Hey subhrangsu bagchi ,

      Nice to hear that mate.

      All the best,

      Felipe

      Author's profile photo Lokesh KrishnaKumar
      Lokesh KrishnaKumar

      Hi Felipe,

       

      Good Blog, helped me solve my issue and meet the requirement.

       

      Thanks,

      Lokesh

      Author's profile photo Mahesh Babu Bodicherla
      Mahesh Babu Bodicherla

      Excellent!. Thanks a lot Felipe!

      Author's profile photo Shailaja Chityala
      Shailaja Chityala

      Nice blog.
      Does anyone tried with fields having string data type in the table? Could anyone help ?

      Author's profile photo Viktor KOSTA
      Viktor KOSTA

      Nice blog and very useful. Well done.

      It is not possible to apply filters to the data source before applying the rank logic.

      Refined "Business case":

      Add a reservation time and allow the user to filter by date/time. I.e. Show the reservation code of the last reservation in until 30.09.2018 (the past is the issue).

      October, November, December 2018 entries should be removed here first.

      Client Customer ReferenceCode ReferenceDate Reference Time rank
      10 1000000001 1205 08.10.2018 11:00  
      10 1000000001 2110 21.11.2018 12:00  
      10 1000000001 9241 01.09.2018 13:45 1
      10 1000000001 3281 01.09.2018 09:00 2
      10 1000000001 6503 19.08.2018 10:30 3
      10 1000000002 40290 19.12.2018 11:15  
      10 1000000002 53210 07.11.2018 13:30  
      10 1000000002 90001 16.09.2018 15:45 1

      Surely, someone would advise to parameters datefrom - dateto, timefrom - timeto to the table function. Yes that can be done. But in more complex scenarios, with more (sorting) fields involved I don't know how to achieve it. Join this CDS to another (by customer) and the fun begins.

      General observation: Keep the first/last (entry), it is does not work if we apply filters to the sorting fields.

      A real case, keep only CDS view results that have at least an entry in another joined CDS view (and applying with complex filter). The equivalent of it in ABAP is "EXISTS ( SELECT * FROM .... WHERE ). Good luck then with CDS.

      It's sad some heavily used SQL features available in ABAP are not present in CDS. Very sad.

      Thanks again Felipe for your nice Blog.