Skip to Content

For quite some time, I’m playing with this idea of comparing data with some raw material (like timber) and information with some product made from that raw material (like furniture). This helps to suggest (a) that data and information are related but not the same thing and (b) that there is processing inbetween similar to converting wood to furniture. With this blog, I like to throw this comparison out into the public, also to hopefully trigger a good discussion that either identifies weaknesses of this comparison or develops the idea even further.

So let’s picture the process on how a tree becomes a cupboard, a book shelf or a table:

  1. Trees grow in the forrest.
  2. A tree is cut and the log is transported to a factory for further processing.
  3. At the factory, it is stored in some place.
  4. It is the subsequently processed into boards. Various tools like saws, presses etc. are used in this context.
  5. The boards are frequently taken to yet another factory that applies various processing steps to create the furniture. Depending on the type of furniture (table, chair, cupboard, …), a high or a small number of steps, complex or less complex ones are necessary. Additional material like glass, screws, nails, handles, metal joins, paint, … are added. Processing steps are like cutting, pressing, painting, drilling, …

Now when you consider what happens to data before it becomes useful information displayed in a pivot table or a chart then you can identify similar steps:

  1. Data gets created by some business process, e.g. a customer orders some product.
  2. For analysis, the data is brought to some central place for further processing in a calculation engine. This place can be part of a data warehouse or of an on-the-fly infrastructure, e.g. via a federated approach that retrieves the data only when it is needed.
  3. At this central place, it is stored in a central place, e.g. persistent DB tables or a cache, where it can potentially “meet” data from other sources.
  4. Data is reformated, harmonised, cleansed, … using data quality, data transformation tools or plain SQL. Simply consider the various formats for a date like 4/5/2011, 5 Apr 2011, 5.4.11, 20110405, …
  5. Data is enriched and combined with data from other sources, e.g. the click stream of your web server combined with the user master data table. Only in this combination you can, for instance, tell how many young, middle aged or old people look at your web site. In the end, data has become useful information.

Hopefully, the similarity between the two processing environments has become apparent. In the end, steps 1. – 5. describe a layered scalable architecture (LSA). In one or the other situation, steps will be much simpler or can even be omitted, similar to what type of furniture you want to produce: a book shelf needs less processing than a sophisticated cupboard. I guess that one can now start to play the analogies: operational reporting, i.e. reporting on data from one source, e.g. one single process (“Which orders have been submitted today?”). This is probably tantamount to producing boards or shelves. The log gets cut by a saw, maybe pressed and that’s almost it. One can imagine that this is even done real-time, i.e. directly after the tree has been cut. In contrast, producing a sophisticated analysis (i.e. a cupboard) takes more processing steps and, specifically, involves more data from outside (i.e. materials like screws, paint, joins, glass, …). Similarly, one could spin the idea to find analogies for data warehouses, data mining, dashboards, data marts etc.  I’ll leave that for a hopefully vivid discussion.

To report this post you need to login first.

10 Comments

You must be Logged on to comment or reply to a post.

  1. Gregory Misiorek
    Hi Thomas,

    great analogy and very tangible and intuitive one, especially for those who have dabbled in cabinet making or carpentry.

    since i’m not really one of them i go by something more intangible: DIKUW. the farthest i could trace it is in Ackoff, R. L., “From Data to Wisdom”, Journal of Applies Systems Analysis, Volume 16, 1989 p 3-9., but i haven’t been able to find it anywhere on line.

    some might even find the source in the bible, but that’s quite far from cabinet making (or maybe not).

    @greg_not_so’s tweet #56482044544417793

    (0) 
  2. Vijay Vijayasankar
    Analogy falls apart on one aspect I think.

    Data in its raw form can lead to several insights.
    If you don’t care for what you see, you can make many other attempts – and always go back to raw data.

    Whereas timber, once made into a chair cannot be then made into a table easily.

    But – this was a good analogy, and it did resonate.

    (0) 
    1. Witalij Rudnicki
      There is a lot of analogy we are already using. Think of Data “Warehousing”, Data “Mining”. Although I am a big fan of using analogy for explanation as well (see my today’s HANA …………, but as well I aware of oversimlification that usually comes with analogy.
      Separate thought: our kids will be using data warehousing example to explain their kids how furniture factory once worked 😉
      (0) 
  3. Alistair McEwan
    Hi Thomas,
    as Vijay already indicated: the analogy is great and has inspired me to think about it. But obviously the difference between hard- (timber) and software (data) still applies, meaning raw data is more flexible. Also – and I guess that this is the argument of the “in-memory army” – processing steps 1.-5. can ideally be so fast that you get real-time … now you can argue when “ideally” applies (probably less frequent that one would think).
    But that doesn’t harm the general idea. As you see: people play with it.
    Thanks!
    Alistair
    (0) 
  4. Bala Prabahar
    Thomas,
    I guess Timber and data comparison helps to better illustrate the status of SAP-BW. SAP-BW is similar to a timber warehouse(processing facility). A processing facility stores semi-processed timber whereas SAP-BW is supposed to contain (semi-)processed data. 
    A processing facility stores only one “image” of same log whereas SAP-BW may contain multiple “images” of same data stored in PSA, DSO, Cubes etc with 1:1 relationship. In SAP-BW, normally all data models follow the same path(from OLTP-> DSO-> Cube) whereas timber log may follow a different processing path depending on business requirements(whether used for chair or building homes etc). (in my experience, I’ve seen BW data model following the same path in almost all situations. I’ve also seen 1:1 relationship between DSO and Cube data volumes).
    In SAP-BW, we create multiple images of same data and then develop compression techniques to brag about the advances in compression. In timber industry, they work on removing unwanted stuff like limbs in one image. They always store only one image of same log.
    Adam Smith developed the concept of “Division of Labor”(DOL) around the time when timber/wood started becoming popular. That concept was/is very popular and works. Whereas DOL breaks down large complex jobs into many tiny components, we can break down very large cubes/DSOs into tiny tables using DB partitioning techniques. However DB partitioning is not widely used in SAP-BW. All we can do is partitioning E table(in addition to implicit F-Fact and PSA table partitioning) using one column, fiscal variant in SAP-BW. This one is a great feature; however all features of partitioning are not used.
    Bottom line: In timber industry, they use what we already have; in SAP-BW, we don’t use what we already have. This is in my opinion a very big difference between data and timber.

    Thanks,
    Bala

    (0) 
    1. Thomas Zurek Post author
      Hi Bala,
      those 1:1 relationship examples are no typical data warehousing scenarios. This is a clear example when logs get cut to create boards if that is all you need. A chain of objects as you mention is an overkill for that. I guess that this is typical for operational reporting scenarios. HANA 1.0 provides an alternative for that. Others are features within BW 7.30 like the hybrid provider which omit the cube in your example and still provides a delta feed into BWA. There is even more in the works like transient infoproviders that are derived at query run time on certain reporting objects (e.g. in Business by Design). Data warehousing is about merging, harmonising, transforming, combining etc. data from multiple sources.
      Regarding DB partitioning in BW: this is not only possible for infocubes but DSOs as well. BW 7.0 provided MDC on DB2 for DSOs. Also, PSA tables are (DB) partitioned along the request. It is standard practice to employ DB partitioning for 2 purposes:
      (a) mass deletion: DELETE is the slowest SQL command. This can be worked around by making sure that data that gets deleted sits in exactly 1 or more partitions that can then ben dropped. This is common, not only in BW but in many, many DW designs.
      (b) partition pruning: in a query, you might want to discard the relvant rows of a table as early as possible – e.g. – by discarding non-relevant partitions of a table. This can be done if there is a filter on the partition key. Ideally, a table is then partitioned along a column on which filters are frequent. Most analytic queries rely on a time reference (e.g. compare 2009 vs 2010). This is why time related columns are a good choice for partitioning if you want to leverage such a pruning effect. This is also standard pratice.
      Now, BW is a best-practice implementation of a DW and intends to relief a customer from turning every knob on the DB level in order to provide good performance. This leads to standard schemes for indexing and partitioning tables, like the 2 described above. These standard schemes work well for most but clearly not every case.
      Best
      Thomas
      (0) 
      1. Bala Prabahar
        Hi Thomas,

        I would like to say a few things about this blog and BW:
        1) As Vijay mentioned, this is a great analogy. I like it.
        2) BW is a great product. I probably am restating the obvious. It is-however- a double-edged sword in my opinion. If used correctly(aka as designed) after learning DW or extended star schema principles, BW would benefit the customers astronomically. At the same time, it also can be used by anyone with no or very limited DW knowledge. This however would lead to-as you stated- a situation of “cutting logs to create boards of little or no use”. 
        I agree general purpose s/w (by general purpose, I mean it is not home-grown but used by several customers across the globe on several different platforms) such as BW can’t support all features of all layers used by it. It is nearly impossible. However I find it very difficult to communicate this point to customers. They-due to probably a lack of understanding- want to either avoid or minimize implementing anything outside SAP.  As a result, there are customers who read time-based-partitions containing several million records more than what they’re interested. Adding one or more non-time related columns(such as region or account numbers or state etc) to the partitioning key would create more partitions of smaller size which would further increase the benefits of partition pruning. However this is not widely understood.

        Finally thanks for this blog and a great response. I’m going to think more on data-timber analogy and share my thoughts either as comments or blog.

        Regards,
        Bala

        (0) 
    2. Tim Carnes
      What stops you from applying your home-grown partitioning ideas on the DB level, even on BW? This is the same tedious thing as if you did this on a home-grown DW on Oracle or Teradata! I don’t get your argument.
      Tim
      (0) 
      1. Bala Prabahar
        Tim,

        Great question. Customer’s understanding is what stops me from applying home-grown ideas. And I agree 100% that this is the same tedious thing in any other platform. Customers are normally reluctant to implement anything outside SAP.

        Thanks,
        Bala

        (0) 
  5. Ina Felsheim
    Hey, Thomas. I like the analogy, but the upfront defintion is missing. With information, we spend a lot of time modeling the information and defining key characteristics. However, with timber, all of those decisions are wrapped up into the kind of seed you grow. This distinction becomes important, because it illustrate how LONG you need to keep history and apply information governance to truly understand your end product.
    (0) 

Leave a Reply