Skip to Content
Technical Articles
Author's profile photo Arsheen Jahan

Issues we experienced when transferring data from SAP to Azure Data Lake in SAP DS

Introduction:

In my last blog, we discussed how to ingest data from SAP to Azure Data Lake using SAP DS.

Please click the link below for the details:

How to ingest data from SAP to Azure Data Lake using SAP DS

This blog is a continuation of that blog, and it focuses further on the difficulties we had transferring data from SAP table/extractor to Azure Data Lake using SAP DS.

Requirement:

We utilize SAP DS to load data from an SAP table or extractor into Azure Data Lake. We encountered a few issues when transferring data from SAP to Azure Data Lake using SAP DS.

Issues listed below are compatible with SAP and have a fix:

Unhidden tab in the SAP table’s data caused a data issue in the DS file:

When we load data from an SAP table, there are occasionally some fields that contain data with tabs that cause data issues. When this happens, the job fails with the error below:

Figure%201.%20Data%20in%20SAP

Figure 1. Data in SAP

 

Figure%202.%20Error%20in%20DS

Figure 2. Error in DS

 

Figure%203.%20Incorrect%20data%20loaded%20in%20file

Figure 3. Incorrect data loaded in file

Approach 1:

To fix this, we may use the regex_replace function to replace the tab with a blank and remove any concealed tabs or other characters.

Please click the following link for additional information:

regex_replace

All the characters mentioned in the ‘regular expression pattern string’ will be replaced with the ‘replacement string’.

Figure%204.%20Use%20of%20regex_replace%20in%20mapping

Figure 4. Use of regex_replace in mapping

 Approach 2:

The functions regex_replace, replace_substr, and translate are inoperable for a few tables. In this instance. We must exclude inaccurate records as our only choice to tackle this. We keep track of errors in a file called an error log file.

We enable error handling at the source table level by selecting the overflow file checkbox and entering the file location where the error entries will be loaded.

Figure%205.%20Error%20handling

Figure 5. Error handling

Where you want to put the error records file must be specified, along with a file path and file name. The error log file will be updated with error entries; however, these records won’t replace any existing ones; rather, they will be appended to the existing file.

How many entries are being put into the blob file and how many of those have incorrect records may both be seen in the trace log.

Language’s characters issue in SAP table caused data issue in DS:

When we load data from a table into a flat file that also contains data in languages other than English, we can receive junk data for those languages.

The data disparity between the file generated by SAP DS and SAP is depicted in the image above. After importing the Chinese data from SAP into a csv file, the ARTXT field contains junk data.

Figure%206.%20Data%20disparity

Figure 6. Data disparity

Solution:

Several file format configuration modifications are required to resolve the issue.

  1. UTF-8 must be selected as the code page.
  2. We must select YES for write BOM.

Figure%207.%20File%20format%20settings

Figure 7. File format settings

 

Figure%208.%20File%20format%20settings%20with%20UTF-8%20and%20Write%20BOM%20changes

Figure 8. Junk Data in file

Figure%209.%20File%20format%20settings%20with%20UTF-8%20and%20Write%20BOM%20changes

Figure 9. File format settings with UTF-8 and Write BOM changes

Figure%2010.%20Correct%20data%20loaded%20in%20file

Figure 10. Correct data loaded in file

For more detail please go through the below link:
Input/output properties in file format

Issue while loading delta for 0CO_OM_WBS_7 in DS:

With init without delta transfer, it is not possible to load the delta in DS for 0CO OM WBS 7. For more information, please see the SAP notes listed below:

1458612 – Datasources with/without Init simulation

The strategy listed below was used to tackle the aforementioned problem.

  1. Both full load and delta load are handled by extractor with CDC. In order to keep the job functioning as a delta init with data transfer, we are maintaining the initial load setting of Yes and the Extract from datetime setting of blank for full load.

Figure%2011.%20Source%20level%20settings%20for%20full%20load

Figure 11. Source level settings for full load

  1. For delta load we are keeping the settings we use for delta load. i.e.  the initial load setting is No and Extract from datetime setting is $g_load_date.

Figure%2012.%20Source%20level%20settings%20for%20delta%20load

Figure 12. Source level settings for delta load

Below mentioned issues are compatible with ADL:

If SAP fields and table/extractor name consist of “/”:

If any table/extractor fields or table/extractor names contain the character /, this is unacceptable in Azure Data Lake and must be changed to” _”. In DS, it is simple to import this kind of table/extractor, however in ADL, we must rename the field names and the table/extractor names.

 

Figure%2013.%20Fields%20renaming%20in%20DS

Figure 13. Fields renaming in DS

Issue with date field which has SAP data for the year <1900

ADL only retrieves data beyond the year 1900 due to year constraint. Because it requires an ADL global settings modification that will have an impact on a downstream project, it cannot be handled by ADL.

Using decode(), we can manage it in DS for both full and delta load.

For both the full and delta jobs, we are utilizing the following decode() mapping on the date field to replace years that are less than 1900. It will use the following function to change the value to January 1, 1900:

decode (year(Field Name)< 1900, ‘1900.01.01’, “Field Name”)

For more detail on decode please go through the below link:
decode

We are updating below record:

Figure%2014.%20Data%20before%20applying%20decode

Figure 14. Data before applying decode

Figure%2015.%20Data%20after%20applying%20decode

Figure 15. Data after applying decode

Summary:

I hope that would be helpful. This is precisely how we tackled the problems. The procedures discussed above can be used to fix problems with any tables or extractors.

Please refer below links for related topics:

Input/output properties in file format

regex_replace

1458612 – Datasources with/without Init simulation

decode

Please feel free to share the feedback and your thoughts in a comment, stay tuned and follow my profile for the further updates.

 

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.