But I know what you are saying now….”how does this have anything to do with MDM?” or “is SAP Education paying you to market this class for them?” (haha). Well, it was during our discussions on all the various repositories for where this unstructured information may live, indexes, configuring crawlers, etc. that I had a slight “eureka” moment….the light bulb went off over my head…dimly…but it was on at least.
Having previous experience with MDM with a ramp-up customer, many things I was seeing started to sound familiar….abstractly…but familiar nonetheless. For the project I was on, MDM is/was still about 2 versions out from meeting our need (full business process mgmt). MDM is still a product in it’s infancy (as of this weblog), but even now, it still does several things well…..like identifying duplicates or possible duplicates based on a set of rules.
Ok….so I finally mentioned MDM….so now “how does this tie into KM?” Well, one thing I noticed right off with KM is that, yes, I can set up indexes and set up crawlers to go out into the wild and retrieve all my documents and other unstructured resources. Top this off with good ol’ TREX to search, text mine, etc, and I should be able to retrieve any of those resources stored away in the recesses of my many repositories. But here’s the problem…how to I keep from overwhelming someone with too many resources. Say I am looking for my company’s “Mission Statement” document. I am in EP and via the connection to KM, I do a search for “Mission Statement” and suddenly I have 10 pages of 50 listings a page each pointing me to “MissionStmt.doc”. How do I know which is which? How do I know which ones are just some copied of version to some local file system and which is the main one to use? Now possibly we didn’t cover it in the class, but I am unaware of any feature in KM that will do that for you. There’s when the light bulb dimly began to glow. “Wouldn’t it be nice if…..” I suddenly had an idea…not much more than an idea, but isn’t that how all great things begin? (haha) Why should unstructured data be treated any differently? It could be considered “master data” in manner of speaking, no?
So here was my idea…as of now, MDM is good at handling the two most needed, low hanging of fruit, business object….customers and vendors…business partners in the general term. With rules you set up in CI (or at least as I remember it), MDM will identify duplicates based on these rules and present a “hit list” of sorts that will give the percentage of “closeness” , if you will…such as exact matches are “100%” and maybe two customer objects names match but their addresses differ so they get a “80%”. From this hit list, you can pick which ones are actually duplicates and set the main object from which to reference. From that point, if you are reporting in BW for example, all the data for the duplicates can be synched up under the one “main” reference…5 vendor numbers and their data become 1, if you will. Ok…so that’s a very basic description, but I hope you get the jest of it. Now, there is still some work to be done if you want to extend the schemas or define other objects, but I think it will come (sooner than later?) in coming MDM releases.
Using the information above, how much of a stretch would it be to consider CM resources as just another business object. You could define a simple “resource” object that would basically refer to a document, link, etc. which is unstructured data. In “some other component”, much like CI, you could set up rules as to what defines a duplicate….maybe file name, file size, and content could all be taken into account in the rules (for example, two resource objects that match in name, size and content are probably going to get a “100%” ranking on the hit list while two files that match in name but differ in size and content could denote some versioning difference). This “other CI-like component” could utilize the same existing crawler and TREX technology/models to resolve these rules and carry that work load. Just like now, a “hit list” could be generated with someone marking which items are duplicates and which are not. Maybe even use some of KM’s example-based taxonomy technology to “train” MDM on how to mark further matches? OK…maybe that’s more than a few versions off. (haha)
Now back to the previous example, once our new KM-friendly MDM exists, a user searching for “Mission Statement” would get only the main reference for the identified duplicate resources out there. No more 10 pages of 50 listings per page….maybe just 1 page of 5 ? Too good to be true? I don’t know, but it sounds good to me.
If you took the time to read this, thank you. If it makes you think, thank me. If my idea here wasn’t out in left field, thank God. Just trying to share some ideas here…