Data Workbench (DWB) Best Practices
Best Practices for Data Migration into SAP Hybris Cloud for Customer using Data Workbench (DWB)
This document is intended for customers that are engaging in data migration activities in SAP Hybris Cloud for Customer specifically initial load and update scenarios using the Data Workbench tool. The points below are not written in any particular order but are general guidelines and best practices for data migration based on experience from customer implementations, input from product, engineering and services organizations.
Data Migration Best practices
- Identify data that needs to be imported
When migrating data from a legacy system into SAP Hybris Cloud for Customer not all data would be needed. Depending on your go-live date it is practical to start with essential set of data such as master data to ensure all the dependencies between the data are in place and choose only relevant transaction data e.g. in may not be essential to migrate all closed opportunities but rather all open opportunities.
- Ensure data is cleansed
Remove all duplicate records in the prepared file that needs to be imported. For e.g. there could be multiple account names – Acme Inc and Acme Corp in the CSV file. Rather than import the file and deal with duplicate records in CSV it is better to check for duplicate records in the CSV file itself.
Remove all “,” values in Account and Contact Names e.g. Acme, Inc to ensure it is not treated as a delimiter
Use existing available tools such as Microsoft Excel to clean and prepare the data file.
- Use sort and filter option to find duplicate records.
- Use concatenate function to merge values
- Use text to column feature to separate first name and last name from one column to two separate columns. This can be applied for Address fields as well in case you want to split into three columns –
If data is loaded with duplicate records use the Data Cleanse capability in Administrator to remove or merge duplicate records.
Note: If duplicate accounts are removed/merged before migration the document references to these accounts needs to be updated as well.
- Check for data correctness and consistency
Ensure that all mandatory fields are entered in the CSV file. Mandatory fields can be identified with the label – mandatory in the file template.
Check data for junk, wrong/special characters, in the file before importing.
Ensure code list values are entered as per the code list value template. Copy/paste values rather than typing manually to ensure there are no errors during import.
Ensure CSV header names are not modified or sequence is not changed from downloaded template. If modified, you need to manually maintain the mapping during the import step.
Data can be prepared in Excel but needs to be saved in CSV format. For data consistency, it is recommended to use SAP pre-delivered template files for each of the objects. This ensures consistency with the column headers and auto-mapping during the data import guided activity. When using, custom header names these needs to be mapped during the mapping activity.
Once the file is prepared, save the file in .CSV format before data loads. Ensure encoding is set as UTF-8. Also, note that CSV will automatically convert any formulas, formatting into standard CSV accepted format. Also, only one sheet is accepted per CSV so if you have multiple sheets they will be automatically removed.
- Look out for Data Model differences when importing data from SAP on-premise/Cloud or 3rd party systems
For objects, such as addresses in the context of accounts and contacts there could be difference in how the data model is represented. Take for example the data model for Business Partner in SAP ERP that supports multiple phone numbers whereas in SAP Hybris C4C it is only accepted within the context of an address that belongs to the account. Basically, if there are n telephone numbers stored in legacy system <> C4C only accepts 1 mobile and 1 landline number). So, understand these differences upfront before importing data as it can be expensive to clean up once imported.
- Only use columns that are required for Import operation
Often you may not need to import all columns in the import step so only choose the exact column headers you need. Remove all other columns from the import file.
This ensure any unforeseen data loss or data updates. Mandatory fields are also visible for each object so users can understand business critical fields that need to be maintained.
For Update scenarios, first export the data to retrieve the additional column names such as Object ID, UUID
Note: Changing column names would result in an error during upload of file. Also for child entities such as Opportunity Item that are first exported and need to be updated, additional columns – Header Object ID and Parent Object ID are included in the file and should not be modified during update.
- Ensure data is loaded in right sequence
It is important to first loads master data such as organization data, followed by employee data followed by transaction data such as opportunity data, lead data etc. This is important to ensure data consistency and hierarchical relationships. For more on prescribed sequence for data loads refer to the Integration guides in the SAP Help section.
- Maintain Settings for Data Loads
- Turn off workflow rules if feasible
- Turn off notifications
- Turn off feeds
- Turn off Address Checks if feasible
- Turn off SDK (PDI) Coding if feasible
- Turn off Outbound replication
- Verify Time zone and Owner settings
All date time fields need to be maintained in UTC time before importing the file.
- Load using Complete BO’s
Although both Complete Business object and Individual Business objects are available, it is recommended to load using Complete Business Objects to ensure data consistency. It is also more intuitive to understand the root/parent and child entities when loading using Complete business objection option. Note that the Complete BO option is in .zip format.
- Import using small batch size first
Start with 10 records first to validate if there are no errors during the import process. This ensures that errors can be corrected early on and not after importing all records. It will be much more expensive to fix errors once in C4C.
Check in the C4C UI if the expected fields are populated.
- Data Workbench using Object ID as primary key for Data Imports
Object ID is the unique identifier used by the Data workbench tool to ensure that each data record is unique. UUID is an internal identifier that further uniquely identifies each of the records especially for parent child relationships but both keys are purely internal reference keys and are not required to be modified by the customer.
For update scenarios of child objects e.g. Opportunity Items there are 2 new keys that are generated – namely, Header Object ID and Parent Object ID to keep track of the dependency.
All of the ID fields need not be updated by the customer and are essentially generated by the ODATA Framework for dependency tracking and data consistency.
- Look out when Microsoft Excel drops leading zeroes or changes data type format
Import as text file to avoid any re-formatting by excel or choose column à click on format cells à special
e.g. date fields in DWB need to be in the format yyyy-mm-dd but excel automatically converts exported CSV files so ensure re-formatting is not taking place before you re-import the file.
Example: November 24, 2015 is formatted as 2015-11-24
- Maximum of 50K records per file is supported for Import & Export scenarios
For Individual Objects, only a maximum of 50K records are supported in each CSV file. For Complete Business Objects, only a maximum of 50K records at the Root/Header is supported. Total ZIP file size needs to be at or below 10MB.
For Attachments, total ZIP file should not exceed 100MB and Individual attachments should not exceed 10MB.
Depending on the total number of records that need to be exported, Data workbench will automatically split the file into chunks of 50K records each. So, for example if the total number of records is 500K that will result in 10 files that are visible in the Monitor Work center view.
- Run the import in Simulation mode first
Before you proceed to do an actual import, run the data import in simulation model to validate the file and surface possible errors. It is more expensive to clean up the data once it is already in the system. Ensure that the data is real world data and not dummy test data during testing and simulation.
- Use Templates to increase productivity
For import tasks that occur frequently you can use templates where the mapping that was used in the previous import can be saved and only required fields needed for import can be maintained.
- View status of Data migration tasks in Monitor view
The monitor view maintains all tasks for the last 3 months by default. All tasks that are older than 3 months are automatically deleted.
- For High volume data loads please reach out to SAP Cloud Operations
Please create a support incident – indicating the volume of data loads that need to be imported. Main Details should include,
- Objects being imported
- Associated data volume for each object
- System and Tenant details (Test, Production)
- Go-Live date
18. Add Extension fields Before Executing Import
Ensure that extension fields that were previously added using Key User Tools (KUT) or SDK toolkit are added to the standard Data Workbench (DWB) ODATA Service before executing the actual import.
19. Tenant Lifecycle Management
Ensure all data loads are done successfully in Test tenants and Development tenants (if applicable) before moving to Production. This is especially critical for extension fields added via KUT/SDK to ensure dependent transports are also reaching target systems before data loads take place.
20. Perform Spot Checks and confirm Data count
Once the data is imported – do spot checks on the UI to ensure desired fields are populated as well as perform a record count across objects to ensure there are no missing records due to errors during migration or duplicate records that passed through.