This blog post is targeted to data enthusiasts/ data architects and those looking for privacy preservation of data in SAP Line of Business solutions like SAP S/4HANA.
Why Data Anonymization and Data scrambling
Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.[Source: Wikipedia]
Data anonymization is a primary requirement in the field of data science. Without anonymization, data cannot be consumed for purposes other than what it was originally collected-for. Access to data is granted for the explicit execution of a business process and any connected legal purpose such as data retention for product liability control or statutory reporting.
GDPR Compliance for Data Security
Of key significance is Article 5 of the GDPR Principles relating to the processing of personal data; Personal data shall be (f) processed in a manner that ensures appropriate security of the personal data, including protection against unauthorized or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organizational measures (‘integrity and confidentiality’). (Source: GDPR EU website)
Fines due to insufficient Technical and Organization measures reached € 332 Million, in the first 21 months since GDPR came into law (i.e. since June 2018). This clearly highlights the need for tools and processes to manage data and protect the privacy of subjects (persons and/or property) stored therein.
SAP HANA Data Anonymization
SAP provides features like anonymization, masking, encryption, and shared business authorization to customers using SAP HANA [Source: SAP HANA Data privacy help page]. SAP HANA supports the data anonymization with methods such as k-anonymity, l-diversity, and differential privacy. Data anonymization can be applied to SQL views or calculation views, thus enabling analytics on data while still protecting the privacy of individuals. This topic has gained significant importance in recent years and SAP platform teams are working to enhance these features. However, what is clearly lacking at the time of this article is a Software Application that applies these principles holistically to SAP data while preserving relationships and content such that business processes can still function effectively.
SAP Test Data Migration Server Data Scrambling
SAP still offers the Test Data Migration Server as a tool in mainstream release until 2027. The SAP Test Data Migration (SAP TDMS) tool was originally designed and built for use with the SAP Business Suite environment and has no support for SAP S/4HANA, as this possibility was intentionally switched-off by SAP Development. SAP TDMS was (and still is) used by numerous SAP customers to create test landscapes with a reduced dataset, and with the option to additionally scramble sensitive data. Non-productive environments built via SAP TDMS could then be accessed by IT project teams included outsourced teams for projects involving new innovations, as well as business teams for meaningful testing of business processes based on scrambled data from a productive environment..
Data scrambling for SAP S/4 HANA and LOB solutions
There are limited offerings available from SAP partners for data scrambling on SAP S/4HANA and LOB solutions. There are offerings in the pipeline by SAP for SAP S/4HANA data scrambling in the SAP S/4HANA essentials environment (public cloud set-up). However, no holistic solution, covering all SAP solutions whether cloud, on-premise or a hybrid, is available.
DAZAM (Data Anonymization for Analytics and Machine Learning)
DAZAM is positioned as a tool which can simplify the process for privacy preservation for data in a company and make multiple use cases for scrambled data possible. The tool can be used by IT teams to scramble data in a copy of production systems, with some unique features which were not possible earlier or with any other offering in the market:
- Consistency across multiple systems in hybrid landscapes: A testing landscape consisting of integrated business processes across multiple systems remains fully functional even after data scrambling
- For integrated landscapes, an approach consisting of ‘scramble everywhere’ or ‘scramble in lead system and perform data alignment via existing tools/middleware’ is possible, or even a mix of the two depending on customer requirements. Assuming data was consistent prior to scrambling the scrambled data should also align afterwards
- Meaningful scrambling of application data such that post-scrambling, most of the business operations on the scrambled data would still run
- Out of the box- solution as a basis for SAP standard delivered tables: No development efforts from the user/customer and full control on how the data should be scrambled. Simplistic configuration and a click-through interface. If a user/customer has appropriated standard fields for their own use, configuration is possible via the interface to adjust the scrambling method used
Co-innovation project for Data scrambling with consistency in a Hybrid Landscape
A co-innovation project using DAZAM was executed to scramble data for multiple applications (SAP S/4HANA and SAP sales cloud), with an international customer (Manufacturing Industry) based in Germany.
The primary aim of the project was to create a production copy of an SAP S/4HANA 1909 system, with no personal data present. Additionally, commercially sensitive data (proprietary and confidential information) such as suppliers, vendors, product information etc, was also deemed in-scope. A full production copy of the data was taken and provided to the project team. The project team then enhanced the basic version of the DAZAM tool to execute the project. A simplistic view of the landscape is presented below:
The project started with gathering requirements on what data should be scrambled. The team worked with application experts to define the optimal methods for scrambling, in order to retain the utility of data after scrambling and be able to run business processes in the SAP S/4HANA system as well as the connected satellite systems (some E2E business processes cross several systems). Extensive tests were conducted after scrambling of the systems to ensure that scrambling met the specifications.
A few highlights are presented below:
- Business partner names are scrambled across the system in multiple tables with consistent results
- Street addresses are scrambled across multiple tables while retaining the relevance to the original country
- Bank data is scrambled to remove any existing information, with the generation of new IBAN, branch and bank account number, while still retaining the country level feel
- Product descriptions were completely replaced with a hash string on all relevant tables
- Data in SAP S/4HANA and Sales cloud was compared and found to be consistent after scrambling, with no loss of functionality while executing business processes
- 4000+ distinct fields were scrambled
- 15 Billion+ values were replaced with scrambled values
Plans for future co-innovation projects
The Team plans to do more co-innovation projects as facilitated under Customer Engagement Initiative in this link. Two projects have completed by the end of January 2021. Second project saw most of the features in use as listed below with over 52 Billion values scrambled in the system using DAZAM tool.
The Tool is continually being enhanced. By end of March 2021, the tool has features like
- Smart capabilities to find Personally Identifiable Information
- simple user interface for configuration
- advanced scrambling methods to deal with primary keys in any field
- multiple scrambling methods used for the same field
- scrambling employee IDs, IBANs and likes
Further enhancements planned
- machine learning capabilities
- scrambling PII embedded in long text consistently across the system
- enhanced levels of anonymity as pre-defined settings
- working with ABAP and non-ABAP systems alike
- enhanced interfaces to consume data scrambling as a service for cloud applications
Data anonymization is a key requirement for every company operating in European Union affecting majority of SAP customers. DAZAM tool is a first hand evaluation of a solution for meeting the tough technical, legal and data utility requirements for data scrambling. The tool aims to provide consistent integrated scrambled environments suitable for testing, analytics and machine learning in a hybrid landscapes.