In my previous blog we discussed the dual-repository architecture and how it could contribute to the way we handle constant changes to the catalog data model. This time I would like discuss how the same architecture can make publishing (or syndication, whichever way you prefer to call it) easier and more efficient. First I would like to clarify my point of view on the publishing/syndication process to make sure we’re all on the same page. The publishing process (in “product catalog” context) is all about sending out product data in a specific format and structure to a specific recipient. In greater detail, we could break down the publishing process (let’s leave out the “/syndication” for a while) to the following steps: 1.First we maintain the data that is required for all recipients in our product repository. Note that while some of the details are relevant for all recipients, some of the details are only relevant for a few. For example, consider a company which uses Amazon for re-selling its products. The “Amazon ID” attribute of a product is only relevant for syndication to Amazon but probably not relevant for other re-sellers. 2.In parallel we maintain the “schemas” that are relevant for each recipient. By ‘schemas’ I mean the list of attributes and the format which should be sent out to each recipient. Note that while in some cases you can maintain a single schema and enforce the downstream recipients to work with this schema (as is the case if you’re a large manufacturer working with small distributors or re-sellers), it may also be the case that you are restricted to use a recipient-specific schema (Such is the case if you work with big companies downstream, such as Amazon or Wal-mart). Up to this point you could call it the “Design Time” of the syndication process as the data (step no. 1) and meta-data (step no. 2. Let’s continue on to the “Run Time” bit. 3. Either on demand or as a scheduled process, data is transferred to the recipient. That includes preparing the data to adhere to the recipient’s schema and also handling all the transport layer issues (protocols, queues, security etc.). From my (humble) experience, I observed that most companies tend to execute step no. 3 in “one go”, so preparing the data (extraction and formatting) and sending the data are coupled. However, as I see it, this coupling might cause a scalability problem; Let me explain why. When we export a catalog we first extract all of the products which are related to that catalog. We might also add some additional selection criteria such as validating that the products are ‘released for marketing’ or that the products have all the required translations. While some of these selection criteria might be achieved through the filtering mechanisms of MDM (e.g. the free form search capability), other criteria might involve querying other systems or performing some complex manipulations that require coding. That might not be that much of a problem when executed on a single product, but when done “on the fly” on thousands of products, this might turn out to be the bottle neck in the publishing process. Let’s recall the dual repository model for a minute. This model consists of two repositories: o A normalized repository which maintains a detailed data model of the products (with all the lookups, qualified tables and the whole shebang). o A de-normalized repository, which contains more or less a flat table of product XMLs and some identifiers like product ID, catalog ID and language. My suggestion is to use the de-normalized repository as a pre-publishing repository in the following manner: o Product changes are created and approved in the normalized repository. o Once in while (using the scheduling mechanism in the MDM syndication server for example) the changed products are exported as XMLs. o These XMLs are validated and manipulated in a way that makes them suitable for publishing. Note that if the recipients schemas are very different from each other you might consider creating an XML per recipient already at this stage. o The XMLs are then loaded into the de-normalized repository. o Whenever a catalog needs to be published, it is a simple matter of querying the de-normalized repository according to the catalog ID. This way, the “deadline-sensitive” step in the publishing process, the actual export, is reduced to the minimum and your road to scale up is ensured.