As often in life there are different ways to achieve an objective. This is also true for our business where often not just one software tool exists to meet the business demands (not only we at SAP can tell a thing or two about it).
In my last project the customer had the requirement to replace Informatica PowerCenter by SAP Data Services and migrate all their existing ETL processes.
Both tools are typical ETL (extract – transform – load) tools to replicate and integrate data between different sources / systems / platforms. And both are very powerful. SAP as well as Informatica belong to the leaders in Gartner’s Magic Quadrant for Data Integration Tools (https://news.sap.com/2020/08/gartner-magic-quadrant-for-data-integration-tools-sap-a-leader/).
But in this blog post, I don’t want to discuss the questions “which is the better / more powerful ETL tool?” or “why did the customer decide to replace Informatica PowerCenter?”. Instead, I want to share our approach and our experiences we made in the project.
Unfortunately, I have to start with bad news: There is no tool which is able to migrate automatically Informatica PowerCenter (PwC) mappings to SAP Data Services data flows. This means you’ll have either to restart from the functional specs or reverse engineer the SAP Data Services jobs.
In our case the functional specs were not up to date or didn’t exist anymore. And most of the business users knowing the requirements in detail have already retired or had new jobs. For this reason, we have chosen the reverse engineering approach.
The next challenge was the complexity, the importance and the extent of the existing PwC processes. They had more than 100 ETL processes (many of them with very complex processing logic) running fully automated based on an extremely flexible framework for configuring, scheduling and monitoring the ETL processes.
The processes served critical business interfaces between SAP and non-SAP applications with realtime (IDocs) and batch (mass data) just in time processing for master and transactional data.
And since the existing solution ran for many years, it was well attuned and very stable.
So, we did not only have to ensure a secure and uninterrupted transition and continual operations for all data integration and data distribution processes, but also had to implement a sophisticated framework for the SAP Data Services jobs.
As if that wasn’t enough of a challenge, we had to switch additionally to a new database server for the staging area, which means that we had to implement and perform a data migration of about 500 GB of data.
You see, a quite easy project 😉
As mentioned above we chose the reverse engineering approach. For that reason, we started in the first step with the technical documentation of the existing PwC mappings. We decided to use Excel for that and documented the mapping logic for each single column in every data stream.
This was a lot of work but that way a SAP Data Services developer could access and understand it without having access to and a deeper knowledge of PwC. In addition, during the development phase, we were able to assign the corresponding SAP Data Services objects (jobs, workflows, dataflows, transforms) in the Excel sheets for each mapping and every individual column. This was very important and helpful for the later tests (that made it easier to reproduce, understand and find possible errors).
And before we started with the SAP Data Services implementation we defined and documented the guidelines like naming conventions, our layer model, handling of target tables, usage of the job framework etc.
In addition, we created SAP Data Services template objects (jobs, workflows, dataflows) which were mandatory to use. In doing so we ensured that every SAP Data Services job is structured the same way. This made testing, troubleshooting and maintenance much easier.
Our guideline document also contained a chapter where we defined the standard mapping between PwC objects and the corresponding SAP Data Services objects:
This mapping served only as a recommendation, not as a hard rule since we didn’t want to simply copy the existing logic one-to-one from PwC to SAP Data Services but wanted to optimize it wherever possible or necessary. E.g.
- in PwC we had a sequence of Sorter -> Expression -> Filter, which was used to identify a maximum value and finally filter on that value. In SAP Data Services we used a query transform to select the maximum value (using the aggregation function max) and just joined this with the original data stream in a subsequent query transform
- a lookup of a single value which was later used as a filter in PwC was implemented in SAP Data Services in a script using the built-in function lookup_ext and storing the value in a variable
- some mass lookups in PwC were replaced in SAP Data Services for performance reasons by a (left outer) join.
We deliberately wanted to give the developers here some “creative liberties” in order to enable the best possible implementation.
Another thing we designed already at the beginning of the project was the test procedure. With more than 100 ETL processes – most of them very critical – it was not enough to run just the developer tests and afterwards one single integration test. Furthermore, it was simply not possible to ensure correct results by only manually comparing the results for this huge amount of data.
Therefore, we have designed and implemented a fully configurable and powerful test framework that fully automates the comparison of the test results with the productive data.
In addition, we set up a test environment in which we
- migrated the existing productive data (this way we could also test our data migration jobs and the whole cut-over process)
- replicated new productive source data several times a day (by separate SAP Data Services jobs that have only been implemented specifically for this reason)
- transmitted IDocs that were sent by the SAP system to the productive PwC processes
- executed the SAP Data Services jobs for the ETL processes simultaneously to the PwC processes in the production environment (in this way we simulated all productive processing in our test environment) and
- analyzed the test results automatically by our test framework at least once a day (the test framework compared the data in the test environment with the productive data and sent the comparison results by e-mail).
We practiced this test procedure for almost 5 months before we went live. In this way, we not only ensured that the results of the SAP Data Services jobs were correct, but also identified some bugs in the PwC implementation 😉
By the way: this approach is not only applicable to this type of project, but also to any other project in which existing processes are replaced by new ones (by SAP Data Services jobs).
Both SAP Data Services and PwC are ETL tools and have many similarities. Among other things, both tools have a graphical user interface for developing the processes. And in both solutions it is called Designer.
If you’ve never seen the PwC UIs before, here is an example of how a PwC mapping looks like in the PwC Designer (it’s not a very simple mapping):
But even if both SAP Data Services and PwC have many similarities, there are still some differences in the details that make it difficult to simply “translate” the PwC mappings to SAP Data Services.
One thing is, for example, the handling of null values or special characters in the built-in functions. It is not certain that it is always handled in the same way. You have to check the documentation for both tools.
A second thing we noticed is the different behavior of comparable built-in functions, e.g. the handling of non-numeric characters in the to_decimal function. While PwC converts strings containing alphanumeric characters (e.g. ‘A12.3Grove’) to 0 (zero), SAP Data Services throws an exception. You have to consider that during the implementation.
However, the most challenging thing was the access to values from the previous record(s) by using variables (which was used very often in the existing PwC mappings).
In PwC you can use variables in ports (columns) of a transformation. The content (value) of the variables depends on the order in which they are specified in the transformation. If a variable is used in an expression before it is populated for the current record, it contains the value of the previous record (considering the whole mapping logic applied to the variable for the previous record).
|in_sal||N/A||salary is only used as input for the expression|
|v_previous_sal||v_current_sal||since v_current_sal was not used / populated before, it contains the value from the previous record (null in case of the very first record)|
|v_current_sal||in_sal||now v_current_sal is populated with the salary (in_sal) for the current record|
|out_sal||v_previous_sal||finally v_previous_val is assigned to the output port out_sal|
In the PwC Designer it looks like this:
In SAP Data Services we have functions like previous_row_value(), is_group_changed() and gen_row_num_by_group(), but they cannot solve cases like retaining a value over multiple rows (e.g. if the column has a null value, it should get the value of the previous row that has no null value). To solve this, you would need to have access to the output column value. But that’s not supported directly by SAP Data Services.
Nevertheless, there is a solution to this. But to be honest, we didn’t invent it. It is described in the SAP Community Network under https://wiki.scn.sap.com/wiki/display/EIM/Previous+row+processing+via+custom+function. Many thanks to the author.
The project was a complete success. We went live in time and budget, we had a smooth go-live and a relaxed hypercare phase and the customer was very satisfied.
Nevertheless, we have made a few experiences that I do not want to deny you.
- Even during the implementation phase you need a PwC expert for technicalities or debugging the existing processes to get a deeper insight and see / examine / analyze in detail what’s going on (e.g. in case of very complex mappings we needed intermediate data to understand exactly what is intended with a particular transformation).
- Track the SAP Data Services implementation and map the names of the SAP Data Services objects (jobs, workflows, dataflows, transform) to the PwC mappings – ideally for each single column. This helps a lot during the test phase and when analyzing errors).
- If possible, try to obtain at least read-only access to the PwC tools, especially to the PwC Designer. A picture is worth a thousand words and makes it easier to understand the processes. But don’t dispense with the technical documentation.
- At least in larger projects you should start thinking about the test phase as early as possible. And try to establish an automated test procedure. Our test framework was worth gold and enabled us to test efficiently. Without it, we would not have been able to carry out safe and reliable tests and to ensure identical results.