Why counting distinct values in ABAP CDS isn’t as easy as you think
Imagine you have a table with the following data in a table of your SAP system:
Airline | Connection Number | Plane Type |
LH | 1 | 747-400 |
LH | 2 | A340-600 |
LH | 3 | A340-600 |
LH | 4 | 767-200 |
DL | 5 | 747-400 |
DL | 6 | A319-100 |
DL | 7 | 747-400 |
DL | 8 | A340-600 |
How do you count the number of different (distinct) plane types in an ABAP CDS View? The answer to that question might be more complicated than you would expect. There are three possible approaches to this task and we will have a look at their pros and cons. If you are in a hurry to find a good solution, try the last one ?
Some sample data and CDS Views
If you want to try this out by yourself, you can generate some test data in your system. I am using the SAP flight data model in my examples and it is available as standard in SAP (NetWeaver) systems. Use the program SAPBC_DATA_GENERATOR to generate some example data. They will be stored in several tables, but we will only need the table SFLIGHT. It contains some data on commercial airline flights such as seat capacity, price, flight date and the plane type which we would like to count. To start with, you can use the following, very simple ABAP CDS Views:
Basic View ZI_FLIGHT
@AbapCatalog.sqlViewName: 'ZISFLI'
@AbapCatalog.compiler.compareFilter: true
@AccessControl.authorizationCheck: #NOT_ALLOWED
@EndUserText.label: 'Flight Data Dimension'
@ClientHandling.algorithm: #SESSION_VARIABLE
@VDM.viewType: #BASIC
define view ZI_FLIGHT as select from sflight {
key carrid,
key connid,
key fldate,
price,
currency,
planetype,
paymentsum,
seatsmax,
seatsocc,
seatsmax_b,
seatsocc_b,
seatsmax_f,
seatsocc_f
}
ABAP CDS Cube ZC_FLIGHTDATAC
@AbapCatalog.sqlViewName: 'ZCSFLIDAC'
@AbapCatalog.compiler.compareFilter: true
@AccessControl.authorizationCheck: #NOT_ALLOWED
@EndUserText.label: 'Flight Data Cube'
@Analytics.dataCategory: #CUBE
@Analytics.dataExtraction.enabled: true
@VDM.viewType: #CONSUMPTION
define view ZC_FLIGHTDATAC as select from ZI_FLIGHT {
key carrid,
key connid,
key fldate,
@DefaultAggregation: #SUM
@Semantics.amount.currencyCode: 'currency'
price,
@Semantics.currencyCode: true
currency,
planetype,
@DefaultAggregation: #SUM
@Semantics.amount.currencyCode: 'currency'
paymentsum,
seatsmax,
seatsocc,
seatsmax_b,
seatsocc_b,
seatsmax_f,
seatsocc_f
}
ABAP CDS Query ZC_FLIGHTDATAQ
@AbapCatalog.sqlViewName: 'ZCSFLIDAQ'
@AbapCatalog.compiler.compareFilter: true
@AccessControl.authorizationCheck: #NOT_ALLOWED
@EndUserText.label: 'Flight Data Query'
@VDM.viewType: #CONSUMPTION
@OData.publish: true
@Analytics.query: true
define view ZC_FLIGHTDATAQ as select from ZC_SBLU_FLIGHTDATAC {
carrid,
connid,
fldate,
price,
currency,
planetype,
seatsmax,
seatsocc,
paymentsum,
seatsmax_b,
seatsocc_b,
seatsmax_f,
seatsocc_f
}
Using SQL
Coming from the database world, I was searching the internet for the term “count distinct” because this is essentially what we want to do. Most of what I personally found was about the SQL approach to it. Count Distinct is a standard SQL functionality and as such it is also supported within ABAP CDS Views. It looks like this:
@DefaultAggregation: #SUM
count(distinct planetype) as planetypecount
At the end of the ABAP CDS Code you then need to attach a group by clause with all the remaining (not aggregated) columns like this:
group by
carrid, connid, fldate, price, currency, planetype, seatsmax,
seatsocc, paymentsum, seatsmax_b, seatsocc_b, seatsmax_f, seatsocc_f
But here is exactly the problem: The grouped columns are pre-determined in the code – in this example we will execute the count distinct for every individual line of our dataset. Because of that, the result will be disappointing:
Airline | Connection Number | Plane Type | Plane Type Count |
LH | 1 | 747-400 | 1 |
LH | 2 | A340-600 | 1 |
LH | 3 | A340-600 | 1 |
LH | 4 | 767-200 | 1 |
DL | 5 | 747-400 | 1 |
DL | 6 | A319-100 | 1 |
DL | 7 | 747-400 | 1 |
DL | 8 | A340-600 | 1 |
Now let’s see how this aggregates when only carriers and the plane type count is selected:
Airline | Plane Type Count |
LH | 4 |
DL | 4 |
The only alternative is to remove columns from the grouping criteria – but then we need to remove them from the ABAP CDS View altogether and therefore loose the information. And again: When you access the query, the count will only be correct if you select the anticipated (grouped) columns. This is not what we are looking for, especially if we want to implement drill-downs based on this query.
Side-note: This count distinct is not supported in CDS Views with the annotation
@Analytics.query: true or in CDS Views based on External Views
Using the DefaultAggregation annotation
Count distinct is essentially an aggregation function. And actually, the code completion for ABAP CDS Views offers the following option:
@DefaultAggregation: #COUNT_DISTINCT
Unfortunately, I was not able to find any kind of actual documentation on how to use it. Here are the only (not useful) SAP Help results I could find:
It seems the aggregation function itself is only supported by CDS query views that don’t generate a BEx query (@Analytics.query: true). If there is anybody who has used this, feel free to comment on how to do it.
Using Exception Aggregation
The third method for counting is to use exception aggregation. If you activate the BEx query generation in the CDS query view, you can use the following code:
@AnalyticsDetails.query:
{formula: '1'},
exceptionAggregationSteps:
[{exceptionAggregationBehavior: #SUM,
exceptionAggregationElements: ['planetype'] }]}
cast(0 as z_planetypecount) as planetype_count
Please note that for this example, I have also prepared a data element (z_planetypecount) to assign a descriptive label to the new column. The code will execute the given formula for each unique plane type taking into consideration the other selected criteria. So essentially when only selecting Airline and the Plane Type Count it will work like this:
Airline | Connection Number | (Plane Type) | Plane Type Count |
LH | 1 | 747-400 | 1 |
LH | 2 | A340-600 | 1 |
LH | 3 | A340-600 | |
LH | 4 | 767-200 | 1 |
DL | 6 | A319-100 | 1 |
DL | 5 | 747-400 | 1 |
DL | 7 | 747-400 | |
DL | 8 | A340-600 | 1 |
And the result will look like this:
Airline | Plane Type Count |
LH | 3 |
DL | 3 |
This is correct! And it will adapt to the selection criteria. So, all in all: When trying to count distinct values and you don’t know in in advance which columns should be grouped, this is the way to go. The downside to this is
- You need to publish this as an Analytics (BEx) query, which limits the functionality of the OData service that gets generated
- The calculation is not done on the database but instead in the SAP BW engine on NetWeaver level which doesn’t provide the same performance
I'm little confused.
It seems you have mixed up DISTINCT with aggregation (COUNT) functionality.
What is the problem with good old:
SELECT sflight~carrid
COUNT( DISTINCT sflight~planetype )
FROM sflight
GROUP BY sflight~carrid
?
Nothing wrong with it at all!
To my understanding the task at hand was to find an equivalent to this in ABAP CDS.
Not exactly...
A data model with ABAP CDS Views can serve multiple purposes but often they are used for Analytics use cases like in S/4HANA Embedded Analytics. Now lets imagine you would create a dashboard on top of the data model in this blog. If you create a chart that displays the amount of planetypes used per airline, the approach of Shai would work fine and give the correct results. However you might also want to display the amount of planetypes used in all airlines combined. Virtual data models such as ABAP CDS VIews are all about reusability, you don't want to create a new view just for that. But if you use the same Views that is fixed to group per carrid, it will return an incorrect result. This problem can be avoided by using exception aggregation.
Hi Steve
It’s worth noting that this is only a problem in ABAP CDS where you use a single CDS Consumption View to try to do everything in one go. This is not the recommended approach as there is almost no reuse when everything is done in the consumption view.
Thanks for explaining your issue so clearly… as this is a common misstep that we have seen on the learning journey to ABAP Programming Model for SAP Fiori. Seeing your problem helps us create better material for everyone … look out for more material around the ABAP Programming Model for Fiori this year.
To give you some further insights right now …
When used correctly the Virtual Data Model (VDM) hierarchy approach is EXTREMELY powerful. You can see this when you look under the covers of a S/4HANA system (if you don’t have one you can always access a CAL fully activated appliance S/4HANA trial system). Looking at a S/4HANA 1709 system is an education in itself!
This is solvable in ABAP CDS by following the recommended hierarchy approach to CDS Views. That is…
Basic view is the lowest layer where we rename database table fields to 21st century naming standards ;-)e.g. You might rename field seatsocc as SeatsOccupied and you might add some simple foreign key type relationships and semantics such as currency and unit of measure.
Interface View is the intermediate layer where you do all the grunt work of calculations. Interface views reference the basic views rather than directly referencing the database tables. You can have hierarchies of interface views to solve aggregation issues, especially where you want to use the result of one calculation to calculate another field. You can also add functions such as currency conversion, table functions, AMDP, etc.
In this scenario you would do your default aggregation in the Interface View, you can then use the result of the count in a higher interface view or in a consumption view.
Consumption View is only the top layer where you decide what you will expose *for this use case only*. You can then reuse your interface views to provide the data and add e.g. analytical query or UI annotations that affect the specific use case only.
The Virtual Data Model approach has been around for some time and started in the analytics space, e.g.
https://blogs.sap.com/2016/08/30/the-semantically-rich-data-model-an-abap-based-cds-views-example/
For that reason the best information on the virtual data model approach is so far mostly in the Analytics space but the same approach can often be used for transactional programming as well (we usually don’t need quite as many layers for transactional programming)
There is some good material here
https://blogs.sap.com/2018/03/19/s4-embedded-analytics-the-virtual-data-model/
You will find information on the holistic approach to programming under the term ABAP Programming Model for SAP Fiori – this blog is a good starting point…
https://blogs.sap.com/2017/12/07/be-prepared-for-the-new-abap-programming-model-in-sap-s4hana/
Hope that helps you on the next steps of your CDS learning journey!
Jocelyn
Hi Jocelyn,
(I guess you meant to say "Hi Steve", as he – not me - is the blogs author, right?! ;-))
very good insight, thank a lot!
I was aware of the VDM concept, but having it re-explained in regard to this blog/use-case helps a lot with understanding!
A small note: Here we seem to have lost part of a sentence:
"rather than [????]".
best
Joachim
Ha! yes... thanks Joachim ... correcting the response above
Hi Steve,
I had a similar problem and for it worked ti use the following (core) annotations:
@Aggregation.default: #COUNT_DISTINCT
@Aggregation.referenceElement: ['planetype']
cast( 1 as abap.int4 ) as NumberOfPlaneTypes
This way the query should work faster as you are free of analytics annotations.
Thanks for the great blog post
Franziska
Hi Franziska Schuenemann , Steve Blum , Jocelyn Dart ,
I have the same problem. Till today I was following how Jocelyn recommended. But for my scenario I am confused if I am doing correct.
Actual values from CDS
So combination of a and x should be merged to 1
and the output in Chart visualization should be
Plant for Mat should be = 4 (But the table view should be same as above, as i have other unique columns)
Output :
X = 4 unique records of Material
If I apply @Aggregation.default: #COUNT_DISTINCT , it behaves same as Count(distinct operand) as alias, and gives me 1 for each rows, what I am looking for is a merge like Steve showed
Can we have both behaviour in same CDS Or is there an option of separating CDS for chart view and table view in ALP template.
Regards,
Tejas
AFAIK this will result in a syntax error, because "exceptionAggregationSteps" is not part of "query":
Working syntax:
Regards
Daniel