Skip to Content

Star Schemas – Miles to the gallon.. How far can you go ?

SAP BI has been implemented in a lot of customer locations and many of them have matured data models and have carefully built  their data warehouses which slowly but steadily inch towards the multi terabyte mark and start filling up with data … valid data that :

1. Can be used for data mining
2. Actively used by a lot of users for day to day reporting
3. Business critical functions are built on the reports that are served out of SAP BI

now it is a picture of everyone happy … but then problems start cropping up in the backend….

1. Reports start slowing down because of too much data
2. Data models cannot be changed overnight and lead to more complicated multiproviders and reports to serve complex reports
3. User requirements start becoming more demanding and simple row and column reports are no longer in tune….
4. Reports start having more RKFs than direct Key Figures

Some of them are innate to the star schema itself.. some of which are …

Brittleness – Changing a data model is something like changing a glass mould….. you will have to basically redo it in terms of melting and recasting the same to redo it some other way – there is no simple accessory way out. Even though you have tools like remodeling the process is still the same even though it happens out of sight to you….

A change of data model would mean

  • Lot of synchronized transports
  • Loads of testing
  • Downtime for reports

Models are not fully extendible – Multiproviders , Infosets , Navigation attributes etc can take you only so far – more like buying time fully knowing that these will not suffice in the long term.

Changes in view are not permitted : The models are built with something in mind, you set out to build a cathedral and two years hence , to look at it as a chapel …? In many cases the person who envisioned the architecture in the first place is not there after two years and the people who come next might view it as a series of mistakes rather than a vision that has not got completed , leading to deviations and changes which might lead to mixed results. Also it is not possible for one person to view the EDW as the person before viewed it. Also the EDW cannot satisfy everyone.

Multiple views of the same data : One cube for Query A , One cube for Query B.. and so on , you end up with a lot of cubes and a lot of redundant data. Often a lot of Navigational attributes for one query alone and dedicating a lot of resources like Disk Space , CPU , App Servers ( yes it is possible ) for a particular set of reports… And then there is the issue of data reconciliation and disk space by way of cubes and aggregates etc…

Changing granularities : Cross functional reports would mean having data at the same granularities. This will lead to creating more summarized cubes for report sake and different cubes at different granularities. Multiproviders can do the trick but then performance is something to worry about. Leading to building more relational data models.

Star schemas are not very good in the long run for a data warehouse. When there are a lot of data and a lot of users, with a lot of diversity, the only way to make the star schema approach work is to make sure the data is non-redundant and model it at the lowest level of granularity.

Even then, when the requirements change, the star schema may still have to be revised or replaced. However, in an environment in which there is little diversity in the way users perceive data, if the requirements do not change over time, and if there aren’t many users, then it is possible that a star schema might serve as a basis for a data warehouse.

These are somethings I have observed across some implementations and again these need not be the order of the day. Just wanted to highlight some things I felt would affect EDWs in the long range of time.

I observed some of the points mentioned in a book by William Inmom , Derek Strauss and Genia Neushloss. Called DW 2.0 – The Architecture for the Next Generation of Data Warehousing. Looked at how the points mentioned against star schema stacked up against the extended star schema of SAP BI.


More of an attempt to draw parallels between traditional data warehousing and SAP BI.

You must be Logged on to comment or reply to a post.
  • Nice articulation Arun. The topic is concise and clear

    My question is there any other ways availabe inorder to avoid or reduce the risk of redesigning the entire data model.
    Could you please elaborate bit more if possible.


    • I would not say risk of redesigning the entire data model... but then look at things like ..
      1. If you have 3 years data then maybe you can look at separating out 1 years data into a separate cube and using a multiprovider
      2. Archive out your data regularly depending on usage in queries... if you have YTD precalculated and report only on Monthly figures .. examine if you need previous data in your cube

      Try and reduce the amount of data per cube to manageable chunks like 1 year etc depending on data volumes.... and start looking at the amount of data you retain in your system for active reporting.