Key Concepts in Privacy Technologies
We live in a digital age, where our personal data is collected, processed, and disclosed at an unprecedented rate. Various Privacy Technologies and Privacy Risk Management Framework are being evolved today to address growing need to protect personal or sensitive data. Privacy being a fundamental right in many geographies and international organizations, the government around the world are in a frenzied race to enact privacy regulations and make it mandatory for organization that collect and process personal to adhere to the core principles of data protections and privacy. In particular, there has been greater emphasis on privacy enhancing technologies to protect the personal data such as technologies that are available for consent management, data minimization, data tracking, data anonymization, de-identification, pseudonymisation, encryptions, tokenization, masking, obfuscation, access control and identity, authentication and authorizations. In this blog, we will review some of the key data privacy technologies commonly deployed and how SAP HANA supports various privacy technology to provide tools for our customers as a data controller to meet the compliance. For the sake of simplicity, only high level concepts are presented in this blog.
In anonymization, individually identifiable data is transformed in such a way that it no longer can be related back to a given individual. Once the data is anonymized, resultant transformed anonymized data would render it impossible to link this to an identifiable person. Thus, anonymization process is irreversible, hence recipient of the anonymized dataset would not be able to “reconstruct” the original data.
The GDPR Recital 26 mentions that the principles of data protection should not apply to anonymous information that does not relate to an identified or identifiable natural person or to personal data rendered anonymous where that the data subject is not or no longer identifiable. Therefore, in GDPR the truly anonymized data is NOT subject to regulatory compliance.
In Privacy Technologies, there are mainly three approaches to anonymization namely suppression, generalization, and noise addition to make anonymize data sets. In suppression technique, one can remove direct identifying value such as Name and ID value from a record. However, there is a still a possibility of privacy linkage attacks by combining publicly available data sets. In generalization, data value is replaced with a range value or keeping high level information such as keeping city instead of street value. Noise addition is a technique where original data value is added with a noise data within the dataset, preserving the statistical property of the data.
In pseudonymization, the original identifiable value is made up with another coded or replacement value which can be random replacement or consistent replacement. While the personal data cannot be attributed to specific data subject, it is possible to reverse this to an identifiable value by having access to data that translates original value to translated coded value. Pseudonymization algorithms must be complex enough to make re-identification extremely difficult. It is usually a practice to keep the original value and translated coded value in a separate database. Pseudonymization is sometimes referred as de-identification technique. It is important to note that GDPR compliance applies to pseudonymization data unlike anonymized data and remains personal data and within the scope of the GDPR.
The data aggregation is used only for statistical analysis and this provides summarized data. GDPR does not consider aggregated data as personal data and hence not in scope for GDPR. However, care must be exercised that in some cases based on the numerical value of the data aggregated, it may be possible to re-construct the database. This is where differential privacy will help to inject some additional noise to make such a re-construction difficult.
Encryption is the process by which plaintext is converted into cipher text with cryptographic algorithms and a secret key. Protection of symmetric keys or private key is pivotal. Hence secure key management principle is paramount in terms of key generation, registration, key usage, key storage, key monitoring, key rotation, and key deletion. There are two approaches to encryption – Symmetric Key Encryption and Asymmetric Key Encryption.
In symmetric key encryption, same key must be possessed by sender and receiver to be able to encrypt and decrypt the data. While symmetric key encryption tends to be faster and provides bulk encryptions, it only provides confidentiality. For integrity and authentication, this must be supported by Message Authentication Code. SAP supports AES-128 and AES-256 bit cipher key length.
Asymmetric Key Cryptography
In a typical asymmetric encryption, two keys are used – public and private key. Each communicating parties maintain public-private key pair. Public keys are shared and is available to parties without any constraint, but the respective private key is kept secret by the communicating parties.
- Sender and Receiver agree on the Symmetric Key (Session Key) for data encryptions via TLS. Data is encrypted with symmetric key by the sender
- Sender encrypt the symmetric key with receiver’s public key
- Receiver decrypt the symmetric key with his own (receiver) private key. Symmetric Key is derived
- Using Symmetric Key, the data is decrypted by the receiver.
There is new encryption technology known as “Homomorphic Encryption” gaining a lot of attention in the industry. The homographic encryption makes it possible to analyze or manipulate encrypted data without revealing the data to anyone. This has huge potential for application in the areas of genomic research, financial and other health care industry when dealing with sensitive personal data.
This is widely used in payment card industry when dealing with credit card or other account number or sensitive personal data. This is a process where converting a sensitive data with a randomly generated a token value and storing the mapping in a database. There are tokenization service providers who takes responsibility for the issuance and management of payment tokens. This reduces PCI-DSS compliance scope for organization as the actual card value is stored with service providers.
- Personal Account Number (PAN) is encrypted and is sent directly to Payment Processing Tokenization Server. The PAN is sent for processing server.
- Token Database maintains token for relevant PAN
- Payment system server uses token for payment authorization. No actual PAN or sensitive data information is stored at the Payment Systems Server.
For PCI-DSS compliance, the payment systems is not in scope as the token server or a responsible 3rd party service provider would assume responsibility for such compliance.
SAP HANA Privacy Controls
SAP HANA provides extensive protection mechanism for sensitive and confidential data. This is available SAP HANA, enterprise edition or it can be purchased individually in addition to SAP HANA, standard edition.
A detailed documentation on SAP HANA Security and Privacy Controls is available at this link.
SAP HANA Data Anonymization
SAP customers who are acting as a data controller collect many types of data which includes personal data, sensitive personal data, quasi identifiers. This data can be further be subject to generalization and noise reduction techniques to create a microdata. A microdata may contain the original records, but the data values have been generalized or suppressed to protect privacy. This microdata data is further anonymized in HANA through variety of anonymization techniques. SAP HANA supports the data anonymization methods k-anonymity, l-diversity, and differential privacy.
The diagram depicts high-level overview of the process:
- A data consumer needs to have access to data for analysis and requests access to the data. The data user must be authenticated and authorized before access is permitted.
- Data Controller defines the anonymized parameters to ensure that there is no possibility of individuals being identified and the data can be consumed purely for analysis.
- The data protection officer can request anonymization reports to monitor the KPIs and effectiveness of the anonymization and fine-tune the parameters if needed.
- The Data Protection Officers provides additional guidance or feedback to the Data Controller for defining additional parameters for anonymizations.
- Data Controller fine tune and grant access to data consumers.
- Data Consumer analyze the anonymized data.
For details on how SAP HANA performs anonymization, please refer to the link here. There is also an interesting write-up and explanation link on SAP HANA anonymization and the type of anonymization supported.
The Data Privacy Regulations are evolving constantly and it is imperative for our customers to use the available privacy technologies within the solution to meet their compliance requirements. SAP HANA and SAP SaaS applications which are powered by HANA provides extensive data privacy capabilities such as consent management, data privacy notice, information report, change audit logs, read access logs and data privacy tools such as anonymization, encryptions, security audit logs, authentication, authorizations, masking. SAP follows Privacy by Design and Privacy by Default in SAP product developments. SAP’s approach to Secure Software Development Lifecycle Framework can be downloaded here.
Great work Jana! Excellent and simple explanation of complex key privacy concepts.