Skip to Content

Catalog Publishing Made Faster

In my previous blog we discussed the dual-repository architecture and how it could contribute to the way we handle constant changes to the catalog data model. This time I would like discuss how the same architecture can make publishing (or syndication, whichever way you prefer to call it) easier and more efficient.  First I would like to clarify my point of view on the publishing/syndication process to make sure we’re all on the same page.   The publishing process (in “product catalog” context) is all about sending out product data in a specific format and structure to a specific recipient. In greater detail, we could break down the publishing process (let’s leave out the “/syndication” for a while) to the following steps:  1.First we maintain the data that is required for all recipients in our product repository. Note that while some of the details are relevant for all recipients, some of the details are only relevant for a few. For example, consider a company which uses Amazon for re-selling its products. The “Amazon ID” attribute of a product is only relevant for syndication to Amazon but probably not relevant for other re-sellers.   2.In parallel we maintain the “schemas” that are relevant for each recipient. By ‘schemas’ I mean the list of attributes and the format which should be sent out to each recipient.  Note that while in some cases you can maintain a single schema and enforce the downstream recipients to work with this schema (as is the case if you’re a large manufacturer working with small distributors or re-sellers), it may also be the case that you are restricted to use a recipient-specific schema (Such is the case if you work with big companies downstream, such as Amazon or Wal-mart).  Up to this point you could call it the “Design Time” of the syndication process as the data (step no. 1) and meta-data (step no. 2. Let’s continue on to the “Run Time” bit.  3.     Either on demand or as a scheduled process, data is transferred to the recipient. That includes preparing the data to adhere to the recipient’s schema and also handling all the transport layer issues (protocols, queues, security etc.). image From my (humble) experience, I observed that most companies tend to execute step no. 3 in “one go”, so preparing the data (extraction and formatting) and sending the data are coupled. However, as I see it, this coupling might cause a scalability problem; Let me explain why.   When we export a catalog we first extract all of the products which are related to that catalog. We might also add some additional selection criteria such as validating that the products are ‘released for marketing’ or that the products have all the required translations.   While some of these selection criteria might be achieved through the filtering mechanisms of MDM (e.g. the free form search capability), other criteria might involve querying other systems or performing some complex manipulations that require coding. That might not be that much of a problem when executed on a single product, but when done “on the fly” on thousands of products, this might turn out to be the bottle neck in the publishing process.  Let’s recall the dual repository model for a minute. This model consists of two repositories:  o     A normalized repository which maintains a detailed data model of the products (with all the lookups, qualified tables and the whole shebang).  o     A de-normalized repository, which contains more or less a flat table of product XMLs and some identifiers like product ID, catalog ID and language. image My suggestion is to use the de-normalized repository as a pre-publishing repository in the following manner:  o     Product changes are created and approved in the normalized repository.  o     Once in while (using the scheduling mechanism in the MDM syndication server for example) the changed products are exported as XMLs.  o     These XMLs are validated and manipulated in a way that makes them suitable for publishing. Note that if the recipients schemas are very different from each other you might consider creating an XML per recipient already at this stage. o     The XMLs are then loaded into the de-normalized repository.  o     Whenever a catalog needs to be published, it is a simple matter of querying the de-normalized repository according to the catalog ID.  This way, the “deadline-sensitive” step in the publishing process, the actual export, is reduced to the minimum and your road to scale up is ensured.
You must be Logged on to comment or reply to a post.
  • Amit,

    I am a bit surprised to see tis article as this tells me that there is another person thinking on the same lines. As part of a major implementation at a Footwear & Apparel Manufacturing customer(went line in July 2006 – MDM 5.5 SP1), we did follow a similar approach (not only for “Publishing” but also for “Importing”).

    Publishing: We built a dual repository mechanism exacly like what you explained with the provision of being able to store the “acknowledgements” recieved from the recieving systems. Each “publishing” is given a unique ID in order to facilitate the “re-processing”.

    Import: On import side, we worked on a “proto-type” with dual-repository mechanism in order to facilitate a custom “detailed data cleansing” tool supposed to be built on top of the “Import” repository.

    However, I can not give details more than this for various reasons…:)

    It is always nice to know that there are other people working with similar ideas.

    All the Best!


    Rajani Khambhampati

    • Hi Rajani,

      Even when writing this blog I was still unsure how valid this solution was, since, when constructing this solution for a specific customer (no details!) I was frantically looking for a reference customer who used the same approach.

      I was not able to find such a reference customer when looking, but now I’m retroactively re-assured by you that such a customer exists.

      Thanks for that!

  • Hi Amit and Rajani,
    I think the idea that Amit described is definitely very interesting. Now I was wondering how the maintenance processes for the data would look like on the customer site and how the responsibilities for the data is assigned. Do you have any experiences how this approach influences the business processes?

    Best regards,

    • Hi Frauke,

      Indeed there’s a little bit of a “catch” here from a business process perspective. The thing is that there’s a time gap between the repository the end-users are working on (the normalized repository) and the repository that is being published (the de-normalized).

      Therefore, you should make sure the end-users (who ONLY work in the normalized repository) can still be aware the content in the de-normalized repository.

      Probably the way to go about it is to add some reporting functioanlity on the status of the catalog data and also have some publishing preview functionality.

      Best regards,