Skip to Content
     During the past two weeks (Oct 18-29), I was in Dallas attending the TEP10 (Enterprise Portal) class that leads up to certification. My experience with EP was always on the architecture, content, development, etc side more from the “application” and “process” perspective. This class really filled in the depth I needed on the sizing, installation, configuration, administration, etc side that is more “systems” oriented. Anyways, during this 2 week class, we had in-depth coverage of the KM component/service/layer/stack/(insert your own term here). I had little to no previous exposure to KM other than knowing “it manages all your documents and stuff”. I knew where it fit into the EP puzzle but not exactly all that it involved or was capable of, and this class filled in those blanks nicely.

But I know what you are saying now….”how does this have anything to do with MDM?” or “is SAP Education paying you to market this class for them?” (haha). Well, it was during our discussions on all the various repositories for where this unstructured information may live, indexes, configuring crawlers, etc. that I had a slight “eureka” moment….the light bulb went off over my head…dimly…but it was on at least.

Having previous experience with MDM with a ramp-up customer, many things I was seeing started to sound familiar….abstractly…but familiar nonetheless. For the project I was on, MDM is/was still about 2 versions out from meeting our need (full business process mgmt). MDM is still a product in it’s infancy (as of this weblog), but even now, it still does several things well… identifying duplicates or possible duplicates based on a set of rules.

Ok….so I finally mentioned MDM….so now “how does this tie into KM?” Well, one thing I noticed right off with KM is that, yes, I can set up indexes and set up crawlers to go out into the wild and retrieve all my documents and other unstructured resources. Top this off with good ol’ TREX to search, text mine, etc, and I should be able to retrieve any of those resources stored away in the recesses of my many repositories. But here’s the problem…how to I keep from overwhelming someone with too many resources. Say I am looking for my company’s “Mission Statement” document. I am in EP and via the connection to KM, I do a search for “Mission Statement” and suddenly I have 10 pages of 50 listings a page each pointing me to “MissionStmt.doc”. How do I know which is which? How do I know which ones are just some copied of version to some local file system and which is the main one to use? Now possibly we didn’t cover it in the class, but I am unaware of any feature in KM that will do that for you. There’s when the light bulb dimly began to glow. “Wouldn’t it be nice if…..” I suddenly had an idea…not much more than an idea, but isn’t that how all great things begin? (haha) Why should unstructured data be treated any differently? It could be considered “master data” in manner of speaking, no?

So here was my idea…as of now, MDM is good at handling the two most needed, low hanging of fruit, business object….customers and vendors…business partners in the general term. With rules you set up in CI (or at least as I remember it), MDM will identify duplicates based on these rules and present a “hit list” of sorts that will give the percentage of “closeness” , if you will…such as exact matches are “100%” and maybe two customer objects names match but their addresses differ so they get a “80%”. From this hit list, you can pick which ones are actually duplicates and set the main object from which to reference. From that point, if you are reporting in BW for example, all the data for the duplicates can be synched up under the one “main” reference…5 vendor numbers and their data become 1, if you will. Ok…so that’s a very basic description, but I hope you get the jest of it. Now, there is still some work to be done if you want to extend the schemas or define other objects, but I think it will come (sooner than later?) in coming MDM releases.

Using the information above, how much of a stretch would it be to consider CM resources as just another business object. You could define a simple “resource” object that would basically refer to a document, link, etc. which is unstructured data. In “some other component”, much like CI, you could set up rules as to what defines a duplicate….maybe file name, file size, and content could all be taken into account in the rules (for example, two resource objects that match in name, size and content are probably going to get a “100%” ranking on the hit list while two files that match in name but differ in size and content could denote some versioning difference). This “other CI-like component” could utilize the same existing crawler and TREX technology/models to resolve these rules and carry that work load. Just like now, a “hit list” could be generated with someone marking which items are duplicates and which are not. Maybe even use some of KM’s example-based taxonomy technology to “train” MDM on how to mark further matches? OK…maybe that’s more than a few versions off. (haha)

Now back to the previous example, once our new KM-friendly MDM exists, a user searching for “Mission Statement” would get only the main reference for the identified duplicate resources out there. No more 10 pages of 50 listings per page….maybe just 1 page of 5 ? Too good to be true? I don’t know, but it sounds good to me.

If you took the time to read this, thank you. If it makes you think, thank me. If my idea here wasn’t out in left field, thank God. Just trying to share some ideas here…

To report this post you need to login first.


You must be Logged on to comment or reply to a post.

  1. Joerg Wolf
    Hi Christopher,
    actually, our search engine TREX has functions to compare documents and do a ‘similarity ranking’ based on the full text of the document. These algorithms are also used for the example-based taxonomy you mentioned below.
    So, what you could do is indeed a search (this requires some project work as of today, though), that goes over your document repositories and finds all the similar documents in there (and even could do some clustering).
    It is not in the standard product as of today but as you mentioned: there is always a release n+1 (or n+2, etc.) where this could go in.
    Did you hear me saying that we WILL have something like this in the next release ???
    No ? OK, good. Just wanted to make sure 🙂 .
    1. Christopher Solomon Post author
         ….and from what I hear….TREX won’t be “TREX” much longer….thanks SAP marketting! haha (Please God, do not let the same geniuses behind “The City of E” rename/rebrand this one!!!!!haha) But not to derail your post….yes, I asked all about that kinda functionality while in class and was told pretty much what you said. Sooooo like many other SAP solutions, why not make use of existing components and make it happen!! =)


  2. Former Member
    Hello Christopher,

    I have read your blog with interest, because we are just evaluating some possibilities of MDM and EP-KM.

    Do you have any experience if MDM can be used as a central metadata storage/layer/persistence for KM-Properties?

    Our intention is to use is as a property persistence for the portal and for other java components/applications.

    I would appreciate any comment on this 🙂


    1. Former Member
      With great interest I read this article!
      Also the post on using MDM as central Metadata repository for KM. Great to see people around the world think alike. Together we can make it a better (IT) world 🙂

      Did you move forward on using MDM in such scenario? We think to start up a project in that direction and any comment/feedback would be great.



Leave a Reply