Skip to Content

Are we data deaf?

Just in case you never heard this before the other 500 times I mentioned it, I used to be an ABAP programmer. If I look back at the programs I wrote – the maximum number of lines of code I wrote has been on conversion programs and interfaces. 


The story goes like this – if you have seen the movie, you can sing along.


A bunch of functional consultants sit with the client business team and figures out the best way to configure the system. Then to make sure things work the way it is supposed to work – they will create materials and customers called TEST1, TEST2 etc. Then they create a bunch of sales orders etc that confirm that all is well. Every one is all smiles, and high fives are aplenty.


Enter the ABAP gang. They get a SHDB recording of the perfect transaction flow, and are pointed to the TEST sales order with TEST1 Customer and Test2 Material as example. Hours/days later, ABAPer comes out with his program that reads data from a file and commits a transaction exactly like the TEST sales order. More high fives, and every one in SAP team goes into hibernation while another team extracts data into a file that the ABAP program can read.


First time the program reads the output from external system, you get an error in salesorder that says order type GR is not defined. Analysis tells you the order type is not GR, but the program is reading the header text and trying to put the first 2 characters are put in order type field because legacy team ordered fields in a different order. Ah – no big deal, why bother to change the file – we will just tweak the ABAP program. Few minutes later, we find that all is well, and decide we are ready for next file with more records. All goes well, till we find that not all records in the new file has customer number filled – some only have customer name.  Big deal – we will code a reverse look up. You get the drift – the smart ABAPer saves the world, 10 lines of code at a time.


The amount of time and money we spend on this is extremely high – and a lot of project failures can be traced back to this issue.


ABAPers are smart – and many have builf utility programs that help ease the pain of maintenance. However, ABAP’s world is limited to make things right within SAP. The missing part is the source system. If we would profile the source system data and map it to target structures, almost all the problems that we face during development could have been found earlier.


It is not as if we don’t have tools for this job – SAP, IBM and many other vendors have excellent products that do this. However, when project gets budgeted, data is one of the first aspects to get de-prioritized. The solution to all data problems is to throw more programmers at it. Very few projects make use of the information management tools. Granted this needs some capital investment in licensing and hardware – but would you rather not do this, and take the risk that your project will fail due to data issues?


Tools are just one part of the issue – we should also consider the people and processes around data. Most SAP projects will skimp on having roles in the project team to take care of data. And “Governance” is something you think of as a strange thing that lives only in powerpoint presentations. It is also not common place in most companies to have “owners” for data. 


At a minimum, even if you want to solve all your data problems in ABAP – please try to move the data profiling part to an earlier phase of your project, like in blueprinting. Do not wait for development or testing phase to find you have a problem on your hands.


Companies spend a lot of time and money on fancy BI solutions, and a lot of those initiatives do not succeed in delivering business value. And almost always, it can be traced back to bad data. At a fraction of the cost, this data can be fixed early in its lineage, and these companies can reap great benefits. But then, we are used to being data deaf – the question is, are we going to do something about it?

You must be Logged on to comment or reply to a post.
  • Vijay,

    I'm looking forward to meeting you in Las Vegas. Are we data deaf(and data blind)? Yes without a doubt. Recently I observed a million records were loaded into an ODS(or DSO) from a flat file. This is a new project and was surprised with the amount of data. So I was curious. The flat file was in Unix server so I used vi and awk to check the quality of data. What I found was just amazing: Each record was 500+ bytes long, contains 15+ amounts. Initially while viewing with vi,  I thought the file(fixed length record) contained the same record 1+ million times. Using awk,however, I found 4 chars out of 500+ bytes were different. Those 4 chars were serial #, first record had 0001, second record had 0002 and so on. What is the purpose of loading 1+ million records with same values (except for 4 bytes)? I don't know.


    • Looking forward to meet you too, Bala.
      Some BI shops do crazy things with their data - I know several that hold more than 10 redundant copies of data with minor enrichments.

      It beats me - SAP ecosystem is filled with a lot of smart people, but the awful way we handle data has not improved much for the last twenty years.

      Data has quantity and quality problems. Archiving is something people love to move to next year's budget. A lot of maintenance trouble can be avoided just by moving old data out to the garage. And needless to say - it significantly increases the ability to enhance and innovate on the system.

      I feel product vendors do not push EIM/ILM parts of their portfolio on par with BI. Vendors, SIs, consultants, customers all have to work this out together - and that aint easy

      • Vijay,

        EIM/ILM: I've seen consultants/customers who're proud of having TBs systems. A part of the problem, IMO, is SAP ecosystem is filled with a lot of smart people. They(Most of them)-instead of doing right thing for the customer- do what is right for themselves: keep it complex(not KISS). I've seen 10 redundant copies with minor(just duplication in some cases) enrichments. Why?
           1) Increases the complexity thereby improving job security. This may backfire in some cases. However if someone is smart, one can always come up with some vague explanation which, on the face of it, would justify their case. It would require a lot of analysis, patience and documentation to show they're wrong. Who has time, resources and money?
           2) increases the size of the database. Resumes with TB experience look better(perception) than 300GB DB.

        See you in LV.


  • Hi Vijay,

    thank you for this post. You're absolutely right. Most companies do not have a Master Data Manager. Perhaps they have Master Data Management but a tool without the proper guidance will fail.

    At Siteco we're going to roll out SAP ERP and CRM to subsidiaries currently running ond different legacy systems. I'm preaching to check the data quality already now and correct them in the legacy system before importing into SAP. Let's hope that will be done.

    Best regards

  • Hello Vijay,

    I am glad someone shares the same feelings that i do. I am of the opinion that ABAP shouldn't be used to plug the errors in the source data. During my 5 years as an ABAPer i have been flooded with requests to modify the ABAP because the data in the input file is not correct.
    The file structure is different, field length mismatch etc. etc.

    I tried a lot to convince my seniors that why not ask the data source to provide data in the format agreed upon, but always it's the ABAPers job to cleanup someone else's mess.

    I'm not against checking data sanity, of course it's a must to check the organisational elements before trying to insert the data into the DB. But ABAP shouldn't be use as a cleaning tool.


  • I couldn't agree with you more...This data deafness has the potential to snowball into something really ugly. You also end up with lots of code that is not optimised for performance & you have the basis guy scratching his head because he can see so much of swapping...Maybe we can throw a bit more money at some additional memory to enhance performance?

    It's a vivious circle starting with data deafness.