Get the most out of Experiential (X) and Operational (O) Data with SAP HANA Real-Time Data Anonymization
It’s undeniable that customer experience matters and taking those experiences into account as a driving force behind business decisions is of high importance. With the acquisition of Qualtrics, SAP joins forces to power the experience economy. There is a broad variant of different use cases for this combination. In this blog, I want to explain how the combination of experiential and operational data leads to valuable insights based on a concrete case. I will also show how to overcome a common obstacle: Experience data is naturally connected to an individual person. Concrete cases often involve personal data. This blog will also show how the combination of experience and operational data can be utilized while at the same time protecting the privacy of each individual.
Assume you are in the shoes of a travel agent at the ACME company and have to book a business trip for a client. Of course, you want to make sure the client has the best possible experience while also spending the travel budget wisely.
Most likely everybody has already had their fair share of good and bad customer experiences while travelling. Having a decent place to stay on a business trip is vital for being productive during the day. What is considered to be decent, of course, depends on the traveler. Younger frequent travelers might prefer having a venue downtown to explore the place they are staying after a long day at work, while middle-aged infrequent travelers might prefer a calmer venue. Experience data contains this kind of information.
Such experience data itself is already very valuable. However, this data unfolds its full power in combination with the travel and expense operational data collected at ACME: The travel agent not only knows which venues are preferred by the travelers of ACME, it also provides insights into the cost structure of such trips. Figure 1 shows exactly that: Which venues are highly rated and how money is spent during business trips.
However, these data sets are private and confidential and even if they are within the same company they cannot just simply be shared: The travel data lets you identify where someone has been on an individual level, and even leaving preferences aside, the amount of money an individual traveler spent should also remain secret at least from the travel agent at ACME. Of course, Figure 1 only contains highly aggregated results but coming back to the task of the travel agent, we need a way to filter the data, e.g., by age and frequent flyer status. This could lead to re-identification as one can see in Figure 3. We selected a 19-year-old frequent flyer female, and there is only one in the complete data set. Information like this can be considered public; just think about profiles on social media. Consequently, any travel agent can link these entries to a specific person and directly observe where she has been, and how much money she spent. Obviously, this is not information we should be sharing.
SAP HANA Real Time Data-Anonymization
To overcome the privacy issue and provide exciting insights based on travel expenses and experience data, we use the SAP HANA Real Time Data Anonymization. It takes the original data, anonymizes it on the fly at the database level and serves it to the analytics application. The anonymized data is not persistent – any new data will be directly available to the application in an anonymized form.
The anonymization makes sure that any combination of age, gender and frequent flyer status cannot be traced back to a single individual. Re-identification, as described previously, is no longer possible. This is accomplished by generalizing values, e.g., persons younger than 25 – like the 19-year-old frequent flyer -are represented by an age group “< 25”. The system decides automatically whether values must be generalized, with the goal of keeping as much information as possible while protecting personal privacy. The user defines how certain values can be generalized and how big the group of indistinguishable persons must be as a minimum. References to more information on how the anonymization works can be found at the end of this blog entry.
After this process, the data is anonymous and can be safely used for the application. Figure 3 shows the result: A travel agent at ACME can get information about travelers’ experiences and preferences that is specific to age and/or frequent flyer status, while fully protecting the privacy of the individuals. Of course, some information, e.g., the exact age, is lost. However this application would not have been possible without anonymization since there is a high risk of releasing personal and sensitive data. Additionally, the exact age is not actually required, five-year buckets are descriptive enough. Just imagine the possibilities: With the help of SAP HANA Real-Time Data Anonymization new applications involving personal data are now possible.
What we have seen is just the start of new applications combining the forces of operational and experiential data. Being able to do this while protecting the privacy of the data sources at the individual level is key. The anonymization capabilities of SAP HANA are not limited to a specific kind of data, it is broadly applicable and configurable to meet our customers’ need for privacy protection. More information can be found on our landing page https://www.sap.com/data-anonymization including videos explaining the available anonymization methods https://youtu.be/IYX4AK8s4cQ.