Data migration from SAP S/4HANA Cloud and SAP HANA Smart Data Integration
This blog post is aimed to share my experiences working on data migration activities migrating data from a SAP S/4HANA Cloud system to a SAP S/4HANA On-Premise system and the usage of SAP Migration Cockpit together with SAP HANA Smart Data Integration (SAP HANA SDI) helping on this purpose.
There are a few challenges on this scenario for data migration:
- Extract data from SAP S/4HANA Cloud
- Transform the data extracted (in files) to use as a source data for Migration Cockpit
This blog covers a solution to this using SAP HANA SDI and the usage of flowgraphs in SAP HANA XSA application. It does not focus on the Migration Cockpit tool (even is a fundamental asset and tool in the data migration activities to a SAP S/4HANA system) but just the journey to load data into staging tables.
Data extraction – Using Customer Data Return standard Fiori apps
SAP S/4HANA Cloud provides specific mechanisms to extract business data out. As customers do not have access to database level (neither SAPGUI login) in these systems, these SAP-delivered Fiori apps performs database table dumps into files and allow users to download then into a filesystem. These tools are necessary for cloud off-boarding customers, for example.
SAP provides two apps for this purpose:
Both apps are used together, and the process is basically extract the data first (generate data files) and then download the data files.
Each database table extracted is dumped into one or more XML files, depending on the table data size and the max file size settings in the app. So, for example, if max file size is configured to 10MB in the app and the XML size would be 27MB, then three files are generated. In case we are extracting table ACDOCA, for example, we would get a list of files ACDOCA_00001.xml, ACDOCA_00002, ACDOCA_00003, … and so on.
The XML files generated contain the business data in the database tables at the moment of generation, and of course the XML files content may vary in future file generation requests if the source system is being used and data is modified. For this reason, a data migration strategy based on these XML files as source data may need a freeze period or similar approach to be sure the XML files generated contain the updated data.
The structure of each XML file is as follows (for example for table FAAT_DOC_IT):
<asx:abap version="1.0" xmlns:asx="http://www.sap.com/abapxml"> <asx:values> <STRUCTNAME>FAAT_DOC_IT</STRUCTNAME> <NAMETAB> <DDFIELD> <FIELDNAME>MANDT</FIELDNAME> <POSITION>0001</POSITION> <KEYFLAG/> <DATATYPE>CLNT</DATATYPE> <LENG>000003</LENG> <DECIMALS>000000</DECIMALS> <NULLABLE/> <SRS_ID>0</SRS_ID> </DDFIELD> <DDFIELD> ... </DDFIELD> ... <DDFIELD> ... </DDFIELD> </NAMETAB> <DATA> <FAAT_DOC_IT> <MANDT>100</MANDT> <BUKRS>FI01</BUKRS> ... </FAAT_DOC_IT> <FAAT_DOC_IT> ... </FAAT_DOC_IT> ... </DATA> </asx:values> </asx:abap>
In case the data of a table spans into several files, all of them follow the same structure depicted above, which means that all of the files repeat the table metadata (field names, types, …).
The first challenge is how to use these files with this data to make them suitable as an input for S/4HANA Migration Cockpit in the data migration target system.
SAP HANA Smart Data Integration (SAP HANA SDI) and the File Adapter
SAP S/4HANA SDI provides a way of loading data into a SAP HANA database from a variety of sources by using pre-built or custom adapters.
Going back to the scenario we have, once data has been extracted from SAP S/4HANA Cloud source system as a set of XML files, the idea is to use SAP HANA SDI with an adapter to feed the SAP HANA database of the SAP S/4HANA On-Premise, processing the data if necessary (data cleanse, transform, …) and then load data into the staging tables (previously generated from a Migration Cockpit project). This can be viewed as an ETL process with the goal of populating staging tables with the data extracted from cloud. Of course, there is still the second half of the process, which can be viewed as another ETL process operated mainly by the Migration Cockpit, getting data from staging tables and finally loading the data into the SAP S/4HANA system.
SAP provides some adapters that can be used out-of-the box to provision data to HANA database, and one of them is the File Adapter. This standard adapter runs in the Data Provisioning Agent (DPA) which may run in a remote system where the source files are located. This agent is connected to the Data Provisioning Server (DPS), a service that runs in the SAP HANA database.
Even the File Adapter supports different file formats, it does not work well with the XML files extracted from SAP S/4HANA Cloud with the structure shown above. The fact that each file contains metadata and that multiple files may exist for a single table makes standard File Adapter not useful without applying first file conversions and transformations. Also, the File Adapter requires a configuration file (.cfg) to include the table metadata.
CDRFileAdapter – A custom SAP HANA SDI adapter to handle XML files from S/4HANA Cloud
SAP HANA SDI Adapter SDK allows developers to build their own, custom adapters for SAP HANA SDI. With some Java knowledge, using this SDK it is posible to build a custom adapter, compile it in a JAR file and deploy it in the Data Provisioning Agent with the Data Provisioning Agent Configuration tool. Once it is registered and enabled, then a remote source can be configured in SAP HANA to handle the files, displaying them as virtual tables.
The custom CDRFileAdapter is then able to process multiple files referring to same table as a single virtual table. Also, the adapter skips the first part of the file that includes table metadata when fetching data from the files. It is also important to keep using same parallelism mechanisms that standard file adapter does using multiple Java threads and blocking queues to speed up file reading and data returning back to SAP HANA Data Provisioning Server.
The strategy is then download XML files from S/4HANA Cloud Fiori apps into a filesystem, run the Data Provisioning Agent in the system files are downloaded and connect it to the SAP HANA database of the target SAP S/4HANA On-Premise system, something like this:
As the diagram shows, the choice has been creating the virtual tables fed by the XML files in a separate SAP HANA database schema called S4CLOUD. Note that SAPHANADB is the name of the schema used by S/4HANA, and the same schema where the Migration Cockpit staging tables are also created (which was supported since a specific release of SAP S/4HANA, as it had to be a separate schema before that).
Other ways to provision data to SAP HANA
As previously indicated, there are other adapters that come to rescue in case some of the tables needed from source S/4HANA Cloud cannot be extracted as XML files. Some data might be available by consuming OData services via standard OData Adapter. Note here that the OData Adapter does not need a Data Provisioning Agent, it directly runs in the Data Provisioning Server and connects to the proper endpoint to put the contents in a virtual table.
In some cases where we have source data in an Excel file, the standard File Adapter is the right tool to provide data to SAP HANA staging tables. This is the case for example for the G/L accounts in financials. Using the standard Fiori app Manage G/L Account Master Data, the G/L account data can be extracted into Excel files, segmented by Chart of Accounts (SKA1) view, Company Code (SKB1) view and Controlling Area view (CSKB). These files can then be viewed as virtual tables using the adapter.
Last, but not least, some data may be easier to migrate directly using download / upload mechanisms provided by the standard Fiori apps, outside Migration Cockpit. This is the case, for example, of the bank accounts using Manage Bank Accounts Fiori app.
SAP HANA XSA – Transforming data and loading into staging tables
While having multiple adapters in SAP HANA SDI, standard adapters and our ad-hoc custom adapter for XML files extracted from S/4HANA Cloud, is a great asset to provision data into SAP HANA database as virtual tables, it is still required to process the files data, transform it and load the proper data in the staging tables created from SAP S/4HANA Migration Cockpit.
Focusing only in the source XML files, the first step of the ETL process would be extracting data from cloud, and this is pretty much done by downloading the files and then use the custom adapter to provide virtual tables in a SAP HANA schema. The data in these virtual tables cannot be mapped directly into the staging tables provided by the standard migration objects SAP provides, as normally some transformations are needed. This processing takes place through a SAP HANA XSA application that uses a database module hosting flowgraphs. These flowgraphs get data from the virtual tables provided from SAP HANA SDI, process them performing operations like DB lookup, joins, projections, filtering, and even SQLscript procedures in some more complex cases. Once data is transformed, it is loaded into the staging tables. At this moment, the first step of the job is done and now is the tun of the SAP Migration Cockpit to run in the S/4HANA On-Premise system to populate the proper database tables using migration objects.
For each migration object required (from a previous exercise where the data to be loaded is identified and the migration objects needed are listed), a flowgraph artefact is created.
An example of a flowgraph used to populate the staging tables for the Profit Center migration object is shown below. The data comes from the two virtual tables CEPC and CEPC_BUKRS in S/4HANA Cloud (coming from XML files), but it is also needed the virtual table CEPCT for a DB lookup operation to read descriptions before loading data in staging table S_CEPC. Also, to load data in staging table S_CEPC_BUKRS data from both source tables CEPC and CEPC_BUKRS is needed, so a database join is used for this purpose.
The usage of replication tasks in SAP HANA turned to be very useful and even necessary at some point of the data migration process. The replication tasks allows to store data in physical table from a remote source. When the data volume is big, for example reading from Universal Journey table ACDOCA (or BSEG table), it could be necessary to load data from virtual tables into a physical table, that is later used by a flowgraph as a data source.
Another scenario where the replication tasks might be necessary is when the data source is not static as a file but an OData service. It could be necessary to take a snapshot of the data at a specific moment of time; consuming an OData service may bring different data if the source system is being used, so a replication task allows to extract the source data in a physical table and then avoid using OData service again until new data needs to be requested.
SAP HANA SDI is a powerful tool that allows provision data to a SAP HANA database from different sources, and the set of standard adapters provided by SAP allows to get data from many sources out-of-the-box.
The Adapter SDK allows developers to create new, custom adapters, for specific needs. Some Java skills are needed, as the adapters are written in Java, based on a Java framework. Very sophisticated adapters can be written to fetch data from virtually anywhere where Data Provisioning Agent is running, doing very smart data selection, filtering and transforming, … before being loaded into SAP HANA database.
This blog post shows a possible use case to develop custom adapter in SAP HANA SDI to meet exact requirements for a specific data source based in files where the standard adapter provided by SAP is not as direct as plug and play.