Data is key to understanding every aspect of an enterprise – from customer and employee behavior to market trends. But while the value of data is indisputable, it is also not always easy to unlock. Data is rightfully protected and respecting this protection is both a legal and moral imperative for all organizations.
Utilizing (personal) data for analytics and machine learning has the potential to improve our lives, our environment, and our health. It can help forecast energy demands, resulting in a better use of renewable energy. It can help improve the way we manage traffic to avoid congestion and better plan our cities. It can help us uncover cures to fight diseases such as cancer. So the question remains – how can we unlock the potential insights of the data without posing any risk to the privacy of the individuals to whom it belongs?
This is a question I have been working on since I first started my Ph.D. in data protection almost a decade ago. Then, after I’d been working at SAP for about a year in the Big Data team, I saw a customer presentation about the potential of their marketing ambitions. The customer explained that the company was limited in what it could do due out of respect to the privacy of the data.
I recognized that this was a huge opportunity with a wide variety of potential use cases and so I began to develop a way to productize anonymization methods – essentially turning my research into a product.
At the end of 2016, I got the chance to make my dream come true and began work on the SAP HANA Data Anonymization functionality. Data anonymization methods allow enterprises to use the data for applications and analysis while still ensuring everyone’s privacy is protected. To do this, it’s not enough to simply remove names or other kinds of identifiers such as social security numbers to render a dataset anonymous.
As an example, imagine a classroom in which the teacher asks the pupil in the red shirt to leave the room. Assuming that there is only one person with a red shirt in this room, everybody will know who needs to leave the room without the teacher having to identify the student by name. Simply by virtue of the fact that there is only one person with a red shirt in the room it’s possible to work out exactly to whom the teacher is referring. The situation completely changes if there are many people in the room wearing red shirts. In this case, no one would have known who the teacher meant: The specific individual is hidden in a crowd.
This is the same fundamental principle that we apply in one of the data anonymization methods in SAP HANA. We make sure that there are at least “k” individuals with the same properties (such as the red shirt) in the anonymous data set. This method is called k-anonymity and is one of the different anonymization methods from research that is implemented in SAP HANA to provide different privacy and utility guarantees. Using well–researched methods and being transparent about how anonymization works is key to building trust while dealing with very sensitive data. This is one of the reasons we published our work at the prestigious VLDB conference. Ultimately, this also allows us to create new applications that would have been unthinkable before.
Today, this technology is used by a wide range of organizations, helping them to derive invaluable insights from sensitive information such as healthcare data without revealing anything about the people behind it.
I now work with three other colleagues on this topic. In addition to building the software, a large part of my job is also about raising awareness around what it can do, how it can help customers, and introducing others to the technology and to our software.
One of my personal highlights was demoing the software at an employee meeting in front of thousands of colleagues. We had just a few minutes to explain this very technical topic, and it was a great exercise in learning how to really focus on the core message of the software. Yes, there was definitely an element of stage fright, but it was also great fun.
But beyond the presentations, one of the main highlights for me is actually the way the colleagues working on SAP HANA collaborate. SAP HANA obviously provides the in-memory speed and performance, but it also goes beyond core database management with application development, multi-model processing, and data integration and quality capabilities.
What makes SAP HANA Data Anonymization so unique is the fact that we are part of this broader set of capabilities which all work seamlessly together. No one else on the market offers the same kind of integrated data anonymization, so from an architectural point of view, we are not offering anonymization alone, but anonymization integrated in a greater security framework and processing engines, such as spatial, too.
For example, SAP HANA manages the original personal and sensitive data, as well as the anonymized view of such data. The security framework has to make sure that users only get access to the data that they are allowed to see. The access needs to be auditable as well, so anonymization always works in the context of the greater security framework.
A second advantage of having all these capabilities in one integrated product is the broad knowledge within the team itself. The SAP HANA team consists of experts across a huge range of topics. It’s a large but active community of developers and product managers who are always open to technical discussions.
Just like these features and functions all come together seamlessly in the product, we all come together as a team. I call it “sharpening the features” – if someone has an idea, they can talk to the colleagues from other areas of SAP HANA. This combined expertise and the different perspectives mean that any ideas we might have for a specific area is ultimately refined and further improved. So it’s not just the anonymization nerds working on their own, but the whole team is working out how this idea fits in and complements other elements of SAP HANA. The end result is a better feature and product for the customer.
Going forward, we’re looking at additional use cases for this technology. The potential is basically limitless, and we’re excited to work with customers on new proof of concepts. If you have a case in mind, let’s get in touch!
For more information on SAP HANA data anonymization visit https://www.sap.com/data-anonymization