SAP Hybris Marketing Segmentation: HANA views matter!
This article will briefly emphasize the importance of HANA information views for SAP Hybris Marketing Segmentation (on premise) with respect to application performance, and it will also suggest a modelling approach for segmentation-related information views which has proven effective in practice.
SAP Hybris Marketing already comes with a set of well-defined data models and pre-delivered HANA information views that allow the customer to use the segmentation option right away on customer data such as accounts, contacts, interactions, or target groups.
However, SAP Hybris Marketing Segmentation is built in a generic way so that a customer can consume virtually any data from within segmentation. In this way, customers can not only enhance the SAP delivered segmentation objects and profiles, but they can even choose to use segmentation for highly individual customer data, be it POS transactions, geo-spatial information, or insurance policies, just to name a few.
HANA Information Views
Apart from the fact that the data has to reside in HANA DB – either through data import, replication, or as transactional data being generated by the application – another necessary prerequisite for the data to be used in segmentation is that it has to be exposed first in HANA through information models (also known as HANA views, such as attribute views, analytical views, calculation views) which are built as logical units on top of the physical data base tables.
These information views are then published in segmentation customizing as data sources, and during runtime, the segmentation generates SQL statements that select from the views. So the interface mechanism by which segmentation interacts with HANA DB is handled exclusively through SQL-selects on HANA views — no database tables are ever accessed directly.
Now in most cases, there will be multiple semantically equivalent ways to model HANA views and expose the desired customer data, but these may differ significantly in terms of view performance on larger data sets. Hence it follows that particular care and attention should be given to the proper modelling of HANA information views when it comes to using customer data from within segmentation. Apart from the available hardware resources and data volume, a crucial aspect of segmentation performance is the proper modelling of HANA views, as well as the number of exposed HANA views per segmentation object.
Furthermore, other important business processes within SAP Hybris Marketing depend on segmentation objects and results, for instance target group creation and processing, CSV data export, as well as campaign execution based on segmentation models. It is very likely that the performance of these subsequent processes will also be affected if segmentation already suffers performance issues due to suboptimal HANA views.
So naturally some questions arise: how should you model your custom HANA information views? How many views should you expose? Is there a recommended way of modelling?
For example: the more HANA views you expose to be used in a profile, the more likely the generated SQL will combine them by LEFT OUTER joins to ensure consistent results across the various segmentation data sources. Since joins are expensive operations, it would therefore be advisable to limit the number of exposed HANA views to a minimum. On the other hand, your data may be structured in such a way that it may not always be possible to join the DB tables together within a single HANA view. This could be the case if your business requires many aggregating key figures or even complex calculated columns. Similarly, it may not always be feasible or even possible to expose all the desired attributes in a single view. Sometimes it makes sense to group semantically-related attributes into different HANA views.
Obviously, each customer’s business needs and unique requirements, as well as the business-related questions of end users (which segmentation “only” translates into SQL queries) do not allow for a one-size-fits –all answer. Business requirements, customer data structure, and performance aspects must all be carefully balanced in each case in order to achieve a satisfying application experience. Nevertheless, implementation experience based on multiple customer projects coupled with internal evaluations suggest a preferred way of how to model HANA information views to be used by segmentation.
The Star Join Approach
The suggested approach is best described as modelling one or at most only a few HANA views according to a “Star-Join” schema. This strikes a good balance between keeping the number of exposed HANA views low (thus avoiding too many SQL joins) and offering good attribute query performance. Here, a graphical calculation view is created as cube with the Star-Join option set.
This view contains a central node as fact table to which multiple dimension views are joined internally. Note that only the resulting star-join view is exposed to segmentation customizing, but internally, the star-join view joins additional dimension views to its central node. A schematic, simple example is depicted below.
Of course, the central node itself may be the result of a far more complex graphical modelling tree that contains joins, projections, aggregations. For a more realistic example, have a look at the SAP delivered information view sap.hana-app.cuan.contact/CA_C_CONTACT_INTERACTIONS.
Since a detailed step-by-step description of how to model a star-join view is already available in the referenced link below, I won’t repeat the individual steps. You can also find the document as an attachment to SAP note 2075429.
So, to conclude, here are a few hints and tips regarding modelling that have proven useful:
- Try not to expose too many HANA views for use in segmentation customizing per segmentation object: remember that SQL related to attributes from different segments will LEFT OUTER join all relevant views.
- In general, use graphical calculation views and avoid scripted views
- Test view performance with setting the option <Execute in> to “SQL Engine” on the view properties tab. As of SPS11, the SAP HANA Modeler offers a tool that helps you migrate attribute and analytical views to graphical calculation views.
- For views used from SAP Hybris Marketing you do not need to set the option <Apply Privileges>. SAP Hybris Marketing uses a different concept regarding instance-based authorization checks for read access.
- Try to avoid inner joins within views, use left outer joins whenever view semantics permits. Join on key columns or indexed columns
- Use calculated columns sparingly – they come with a high premium on CPU resources. Adding constant value columns may often be a better option.
- Always specify the cardinality with joins, if known. The HANA optimizer evaluates this information at runtime.
- Filter data amounts as early as possible in the lower layers (constraints, WHERE clause…)
- Avoid calculation before aggregation on line item level, i.e. do calculations after aggregation
- Minimize modelling features that force HANA to switch execution engines during internal processing, or at least reduce data transfer between the engines
- Use the SAP HANA Modeler PlanViz tool to analyze and identify performance bottlenecks and costly operations within your views.
- Test the performance of your views on large data sets already early in the development and modelling process so as to avoid performance “surprises” later on in the productive environment. (Remember that any view variant will perform well enough on small data sets.)