Skip to Content

Top 10 things needs to know from customer to estimate efforts for Data Enrichment and Classification

A common question that I have seen doing rounds when it comes to estimating efforts for any kind of data services – what are the critical inputs that I should ask customer to get to correct (almost) effort estimation. This is especially for the supplier data enrichment and classification. When customer venture out for a Spend Performance engagement – the first question they ask each vendor – How much time do you need at what cost ?


For any product implementation – that’s relatively easy question to answer – as you know technology, platform and specifications around it. You are possibly aware of technical hurdles or business issues in implementation (read – installation, config, setup) the product. So estimation can be near perfect once you are done with couple of them.


The Data Management or Data Standardization for supplier base and classification of transactions based on chosen taxonomy is a process. So it comes with its own unpredictability about how it will work, what would be issues in a particular scenario. So if you have not done good job of listing all “predictable” scenarios and then cross checking it with your customer – your effort estimation is only somewhat correct.

The next question that usually get is – is there a list of such parameters which needs to be cross checked. I havent come across any list – so I tried putting one based on my experience.


Scenario – A Multi national, Multi system large enterprise wants to implement Spend Analytics solution – with Data Enrichment and Classification option to get clean and good analysis. What would be my high level parameters to estimate Data Services efforts ?


  1. How many source systems ? Whats the variance – like SAP, Oracle, Ariba, legacy ? – E.g. If SAP is 90% data then high chances of good quality and integrity data – so less efforts. More the legacy – more the efforts
  2. Data structures in each varied systems – this increase your ETL time
  3. How many countries ? May be all SAP but from different countries
  4. Whats the estimated direct and indirect spend. Direct will have less vendor base. Indirect is more. Direct data is good, indirect may not be.
  5. How many number of suppliers ? This is the raw supplier base (as its counted from each source system) – You need to count the shrinkage base on this one to count your efforts. E.g Travel data will have high shrinkage – due to same vendors repetedly being used for airline or hotels, whereas data from SAP – probably have low shrinkage as more checks and balances in place.
  6. How many transactions ? More transaction more efforts is a thumb rule but if your vendor base is low and transaction high that means the material / service is highly repetable and will lower your classification efforts as bunch of transactions will be classified under same code
  7. Whats the quality and integrity of the data ? Customer perception and sample set based
  8. How many languages ? Does it contain any asian characterset like China, japan etc ?
  9. Which industry domain ? – E.g. A oil company will have most of MRO spend, but a financial organization will have service spend. Depending on earlier experience this will have own impact on efforts.
  10. Which classification taxonomy ? – One taxonomy is good for services others for material. Or Custom taxonomy is a option



I can list out many more – but these are the top 10. Now the key question is – whether customer will be ready to provide these kind of inputs upfront or at the start of sales cycle ? I don’t think so. Or even if they provide answers, its highly likely that its tentative or wrong. The quality perception of data from customer is always a point of debate. Your Subject matter expertise, experience are the only key – how you get as much critical input from customer – so to get to right number.



Prashant Mendki

Linkedin –

Twitter – @pmendki

1 Comment
You must be Logged on to comment or reply to a post.
  • Hi Prashant,

    Would you suggest any other data enrichment services, besides Ariba or Oracle?

    Which is the best solution and more affordable.

      Thanks before.