Privacy Protected Data Has Value Too! (Part 1 of 2)
Product: SAP HANA SPS04
Private data carries a stigma with it – do not touch it or else. Why? The consequence of accidental exposure and unauthorized access can be crippling to any organization.
Yet the need to access this classification of data remains. One popular option is to adopt data masking rules. Applicable for operational use cases, the masking of data loses its usefulness and functionality when analyzing multiple rows of data as a set.
The barriers to utilizing protected data for operational and analytical use cases are now overcome. The latest version of SAP HANA (SPS04) offers tools that allow users to query both sensitive attributes and measures while anonymizing the original values. In the first article of this series, I will demonstrate how one can utilize SAP HANA to apply k-anonymity to the attributes of a dataset.
Obtain value from sensitive attributes
I have recently discovered how easy and efficient paying at the register with my watch can be. To be honest I was a skeptic at first. It was not long before I found myself asking every cashier, “Do you accept mobile payments?” As a data enthusiast, I think about the potential value in the information from each one of these transactions. If a company along with its trading partners could share and analyze this data in real-time without compromising my personal identity, the retailer could offer instant rebates or promotions tailored just for me – sounds like a win-win for everyone.
Let’s understand how this use case might be implemented.
Generalize specific data values of a given attribute using k-anonymity
K-Anonymity enables the actual value of a given field to assume a more general value by combining multiple records into a group sharing the same more general value for the given field.
There are six basic principles to understanding k-anonymity:
- The process to generalize data utilizes a hierarchy with the actual field values composing the leaf nodes and the most general value composing the root node.
- Hierarchy movement from a lower level to a higher level represents the act of generalizing field values.
- The variable k specifies the minimum count of records that must exist at a lower level in the hierarchy before a field can assume a more general value from a higher level.
- If the count of records for a given branch in the associated hierarchy is < k, then those records can combine with sibling records from another branch to overcome the minimum record count specified by k.
- If the count of records for a given branch, siblings, and related higher level branches in the hierarchy is < k, then an empty value is returned in the field’s output.
- The best result yields a set of rows where more general values from a higher level in the hierarchy replace all original values for the anonymized field.
As an example of these principles in action, let’s consider the story where a retail company desires to share customer data with one of its partners. The table below offers an example of the outcome after the application of the k-anonymity algorithm to the customer’s date of birth.
Incorporate k-anonymity into calculation views
One method to implement k-anonymity with SAP HANA is through the use of calculation views. Continuing with the story of the retail company as an example, the video below demonstrates how to configure a set of calculation views for k-anonymity. The demonstration covers the following topics:
- Modeling calculation views of type dimension for master data, hierarchies, and anonymization
- Configuration of the anonymization node
- Viewing the anonymized results using the data preview
Unlock more value from your data today
Concluding my example story, the retail company and trading partners are able to work together to enrich a customer’s shopping experience using data that previously could not be shared. I would like to challenge the reader to take a moment and consider the same. Are there any use cases requiring privately protected data that might bring untapped value to the organization? What opportunities now exist for insightful analytics, increasing customer engagement, and/or generating new revenue streams?
To start using SAP HANA today, signup for SAP HANA in the Cloud or contact your account representative.
Andrea Kristen’s Blog Post: https://blogs.sap.com/2017/11/10/anonymization-analyze-sensitive-data-without-compromising-privacy/
Stephan Kessler’s Blog Post: https://blogs.saphana.com/2019/04/15/anonymize-like-a-rock-star-or-whats-new-on-data-anonymization-this-spring-in-sap-hana/
Roosi Magi’s Blog Post: https://blogs.sap.com/2019/03/21/unlocking-the-value-of-healthcare-data-with-sap-hana-data-anonymization/