As mentioned in my first blog „BRFplus in Big Data scenarios“, I want to share some other topics related to processing mass amounts of data. Faster than I expected, I’ve solved a problem regarding a dialog for operational analysis. It’s about topic 3 of my last blog:
“Exploratory search techniques to determine anomalies in invoices – how can we support the end users during their daily work? Can we use ALV grid to analyze data models with master-detail relationships?”
We start with the previously discussed business case: Processing a huge number of invoices sent to a statutory health insurance company.
A set of checks is being processed every time new invoices arrive. Every check focuses on a specific aspect or constellation of an invoice that produces clarification cases that have to be analyzed by the end user. But these checks were developed due to a specific concept or aspect. So they target the most common known anomalies in these invoices. What about other constellation that have not been considered before?
One solution to this problem may be a dialog in which the user can do some kind of pre-analysis to look for new patterns to identify anomalies, errors or even detect possible fraud cases among those invoices.
This task sounds easy. You take the invoices to look at and put them into an ALV grid. As a result, you can use all filters and sort criteria of the ALV grid. Your are also you are able to save your “query” for reuse as a user-specific layout.
At this point we need to raise some important aspects:
- We have to deal with approximately 28,000,000 invoices a year for a single health insurance company. The number of invoices does not fit into an ALV grid.
- Each invoice is supported by some medical documentation. We have to be able to search the invoices for the existence of a specific medical treatment, surgery, and so on.
- These filter criteria have to be saved like usual ALV variants for reuse purposes. Unfortunately we can’t use the normal ALV variants since we are dealing with master/detail views. So we have to develop different techniques, which are main topic of this blog entry.
- The user must be able to see all of the data of a specific invoice, even the detail data.
- Our solution cannot be addressed to HANA-specific techniques only. We have to support a kind of HANA-readiness without facing a solution with two different lines of code.
In this case we are talking about a generic solution. With this tool, the user is able to do an exploratory search about billing errors in our scenario without moving into a Business Warehouse.
Introducing our UI prototype
The fact of the matter is that we cannot build upon HANA-specific Features, we can only chose a solution with a standard ALV grid. We split the screen of the dialog into two sections. A master view with one ALV grid for displaying the invoices and a detail view containing three ALV grids corresponding to three kinds of supporting documentation.
If you double click an invoice, all of the detail data is loaded into the corresponding ALV grids.
Handling of mass data
We decided not to load all invoices into the ALV grid to avoid overloading the master view. During an exploratory search you are not interested in each single invoice. Rather, you have to find another filter to reduce the search result – to give your interest boundaries. For now, we are talking about an extract of the data (called “Ausschnitt” in the screenshots) and the entire data of the underlying table (called “Gesamtdaten”). The size of an extract is defined by the user. With the size of the extract the user decides how many invoices have to be displayed and thus are being transferred to the front-end. It’s a constraint to define the maximum amount of data. To gain this kind of truncation we added a new button to the ALV grid.
“Sätze” stands for rows or even invoices which definess a kind of truncation. The given filter and sorting criteria of an ALV grid operates on the internal data table of the grid – so it only addresses the extract but operates very quickly.
If we want to reduce our search result effectivelyy the filter and sort criteria are to operate on the whole underlying data table. Due to that we added some other functionality to the grid to control the ALV filter and sort actions.
- „Sortierung Ausschnitt“ : Sort the extract in the front-end (ALV grid standard)
- „Sortierung Gesamtdaten” : Sort the data in the backend (database) and rebuild the internal table (of the ALV)
- „Filterung Ausschnitt“ : Filter the extract in the front-end (ALV grid standard)
- „Filterung Gesamtdaten“ : Filter the data in the backend and rebuild a new internal table on that result (of the ALV)
With the help of these new functions we are able to search for various facts in a huge amount of data. If the amount of data to be displayed is too big, we only see the tip of the iceberg due to the truncation.
Detail filter – How to deal with them?
The search by detail data was a bigger problem to deal with. An ALV filter only operates on its own data context. But the detail data is separated in additional ALV grids. So we built a popup with additional filter criteria addressing the keys of the medical documentation we have to deal with.
The constraints of the detail filter are handled by a manager class of our master view. Such a constraint is defined as a single Select-Option. This manager class builds upon these constraints a SQL query. We built all combinations of master to detail relationships as an EXISTS clause for each detail relationship. We counted 8 valid combinations (guess 2^3 😉 ) so the SQL queries can be hard coded in the manager class.
Another problem was taking care of a detail filter while saving an ALV layout in the master view. If the user defined a detail filter it has to be saved together with the normal layout. We solved this by creating an add-on table holding the detail filter under the same key of the ALV layout. To do this ,we added some functionality in to the event handling of the ALV layout button. When the user loads a layout the detail filter is loaded too so the original search result is reproduced.
With these extensions to the ALV grid we realized a master to detail relationship with standard ABAP coding to allow the user to do exploratory searches in the whole data volume.
With this solution we are not at the end. Due using a Select-Option to define a detail filter, we are not able to do all kinds of filtering. We suggest keepingtrack of all invoices containing the medical treatment A und B except C or D. With a Select-Option, we are only able to select via OR and not via AND.
Additionally, we want to transform such a saved layout (with detail filter) to a customer defined check function which is being processed when a new invoice arrives. Under this condition we were faced with a major ABAP limit:
The 8 hard coded EXISTS clauses no longer match the new requirements. So we have to deal with a dynamic WHERE clause which does not currently support sub queries. We hit a dead end.
Without trying to use HANA-specific functionalities the only solutions we mentioned were the use of native-SQL or the generation of ABAP coding. We decided to choose code generation. Our manager class is able to transform a saved ALV layout with an applied detail filter into an SQL query. This built query is embedded into a generated ABAP class that could be used in our operational analytics framework.
What should be improved? – A wish list / Subject to Change
During the development of this dialog (which is based on the ALV grid class CL_GUI_ALV_GRID on NetWeaver 7.40 SP4), we also examined the use of the ALV-Grid with IDA (HANA-Grid class CL_SALV_GUI_TABLE_IDA on both NetWeaver 7.40 SP4 and SP6). We also noticed the availability of this grid on none HANA systems. That’s a very important fact. This capability may help us to rely on only one codebase. We do not need to separate HANA-only features in another codebase which has to be switched out on non HANA systems. Unfortunately the current version of the HANA grid doesn’t give us the ability to influence the handling of the grid the way a normal ALV grid does. We need the ability to take care of the grid’s internal table of data storage in non HANA scenarios. Another solution could be a refresh function by the grid. By calling the grid’s refresh function, a new SQL query is sent to the database with the defined filter and sorting criteria. Also we need the ability to define the truncation.
After solving this, we are still not ready to use the HANA grid due to the fact that we are not able to manipulate the WHERE-clause. Well in the meantime I think the solution will be a master-detail-relationship based on HANA-features. But this doesn’t help us in our case.
Due to our business scenario, our solution has to be used by both, HANA and non HANA customers.
As I described earlier, we use code generation to avoid native-SQL. But the code generation doesn’t help us in case of the HANA grid. Remember we do not have the ability to query the database on our own.
With our solution, we are able to support the user with an exploratory search technique. It’s a kind of generic solution that does not rely on further development once it is transported to the customer. Due to the fact that the user never wants to jump from one System to another (for instance to jump into a BW system, do an snalysis there and jump back to his operative application), he is able to build a check out of the combined filter criteria to integrate it into the previously defined set of checks that are processed when a new invoice arrives.
This prototyping is not at the end. Some technical problems must be solved to build other features on top of this approach. But these problems are tough because they depend on needed extensions to OpenSQL (sub queries in dynamic WHERE clauses) or to the HANA grid.
The general availability of the HANA Platform features is not only a huge gain of speed but also of additional functionalities to operate on Big Data. But first we have to lift the customers to that platform. I suggest it is only done by building hybrid solutions that handle of the base business cases which run faster and give some view of benefits if run on HANA. To minimize the effort, we have to rely on only one codeline that we have to support.