Skip to Content

Re-posted from my original blog post in 2009 from Geek2Live.net

I recently had to provide a list of best practices in Universe Design – partly compiled from the SAP BusinessObjects training material, as well as my own experiences.  I believe that the universe is the foundation of the BusinessObjects system.  With poor universe design, required reports are often difficult or even impossible in some cases.

Poor universe design can also lead to wrong data, non-optimal SQL generation, incompatible objects and/or Cartesian products (or very wrong data).

I’ve categorised the best practices into two groups, End-User and Cosmetic methods, as well as Functional or Optimisation methods.

End-User / Cosmetic

  • The universe should be created from the end-user perspective, not from a developer’s perspective. This should be kept in mind whether designing new universes, or modifying existing universes.
  • All metadata (class names, object names and descriptions) should use business terminology. This allows the end users to easily use the universe without understanding the underlying database structure. It also speeds up development time, as header cells do not need to be edited. Users can then also search the universe when building a report to quickly find an object. For example, instead of calling a dimension object ‘Commit Cuml’, it should be called ‘Cumulative Commitment’.
  • All development objects should be removed before promoting to Production. Universes should not contain old, legacy or development objects (and classes) – these should be removed before promoting to Production. If absolutely necessary, these should be hidden and placed in a ‘Legacy Objects’ class so no mistake can be made regarding their status.
  • Objects should not be both dimensions and measures unless absolutely necessary. This is confusing to the user, and unnecessary. Providing row-level data is possible by introducing a unique object, like the ID.
  • Object formatting should always be specified, especially with measures. Formatting like dates, currency and so forth is important to provide objects that the user doesn’t need to format themselves.
  • Date fields should be expanded using SQL functions, and split into separate objects like Calendar Year, Calendar Month, Calendar Month Name etc. A library of date functions for both T-SQL and PL-SQL should be built up and re-used where possible.
  • Classes and objects should be based on user requirements, not table-centric. Classes might contain objects from many different tables, and should not be based on the columns within a table.
  • Custom hierarchies should always be defined and done according to the end-user’s requirements. This prevents users from drilling using default hierarchies, and allows control over what users can drill into.
  • Pre-defined conditions should be used to accelerate report development and encode difficult query filters into an object that the user can use easily.
  • Object names should be unique, and never repeated, regardless of the object classification.

Functional / Optimisation

  • SQL aggregation functions should always be present in measures. All measures should have both a SQL aggregation function (ie SUM, COUNT and AVERAGE), then a projection aggregation function assigned (respectively SUM, SUM and NONE). This ensures that GROUP BYs are used and the processing power of the database is leveraged, returning fewer rows quicker. This also has additional performance benefits at the data provider level, where less RAM will be used on the server, and lower network traffic of data.
  • Tables should not be unconnected. This will generate a Cartesian product for the end-user, and leads to very confusing universes. If tables have no way of being joined at all, they should be placed in separate universes.
  • Left outer joins should be avoided where possible. This is sometimes impossible with transactional schemas, but left outer joins lead to NULLs and slower SQL queries. If left outer joins are necessary, universe objects should contain ISNULL functions or similar so that users aren’t presented with NULL values when using objects.
  • Cardinalities on joins should always be set to allow for loop and SQL trap detection, this includes self-joins that should be marked as 1-1. Always set cardinalities manually, not automatically.
  • Contexts should be used, and if too complex, the universe should be split up. This allows for path navigation and allows a universe to have more functionality.
  • Loops and SQL traps should always be resolved to avoid Cartesian products and incorrect data. This is a very important component of universe design, especially with transactional schemas.
  • Integrity checks and refreshes of the database structure should be done regularly to ensure the universe has the up-to-date view of the database.
  • Views should be used in preference to derived tables, and kept to a minimum. This ensures optimal SQL generation.
  • When aliasing tables, the original should be placed in the top left of the schema window and noted with ‘Aliased Table – Do Not Remove’. This will ensure that an original table isn’t removed by mistake, which will also remove all aliases of the table as well.
  • The query limits should always be set, to ensure that runaway queries or Cartesian product queries don’t bring down the database. Both length of query time and number of rows should be set. The limits will differ per database.
  • Multiple SQL statements for each measure should be turned off, to generate optimum SQL.
  • Index awareness should be used where possible, to generate optimal SQL.
  • List of values should be disabled for all dimension and detail objects that don’t require LOVs, to avoid users doing queries like all IDs for a given table.

Update

I received quite a few comments on additional best practices, so I’ve included them here.

  • Lock the universe when editing to prevent other users from editing the same universe.  This can be controlled with the security model, so only the user who locked the universe can unlock it.  However, beware that using the Import Wizard still allows you to overwrite a universe that is locked (Jansi).
  • Use universe parameters to control how the SQL is generated.  You can control SQL generation so it is ANSI-92 compliant, as well as use JOIN_BY_SQL to allow separate sub-queries to be combined at the query level instead of in the report (Andreas).
  • All ratios should be set as database delegated/smart measures.  This ensures that the ratio isn’t calculated incorrectly (ie average of an average) and is calculated back at the database (Andreas).
  • Generic time objects for Today, Yesterday, This Month, Last Month etc. should be provided (as dimension objects as well as filters).  These dimension objects can be used in filtering objects (Some Date Object = Today Object).  Using filters may depend on the number of dates present in your data, however you could still create filters for the most common date objects (Andreas).
  • As a rule of thumb do not use the WHERE clause in any measure, dimension, or detail object. Use CASE WHEN … SQL syntax instead.  Combining two objects that have WHERE clauses filtering the same column(s) will return zero results, which the user won’t expect (Andreas).
  • I also recently stumbled across the same kind of post over at the Business Intelligence blog.  There are some differences in our lists, so you may wish to refer to that post as well.

Updated 22/07/2009.

– Josh

To report this post you need to login first.

17 Comments

You must be Logged on to comment or reply to a post.

    1. Erdem Pekcan

      Hi Andreas,

      I’ve tested WHERE clause vs. CASE WHEN by selecting Multiple SQL statements for each mesure” and setting “JOING_BY_SQL” as yes”.

      I’ve included MTD and YTD calculations together and as a result WHERE clause performed better. The database I’ve used was Teradata.

      Does your warning has a dependency based on the database?

      What should I pay attention when using the WHERE clause?

      Thanks!

      (0) 
  1. Zahid Yener

    Hello Joshua,

    It’s a very well and helpful blog.

    I have one question. I have a customer and they want to build only one universe for the whole company and reporting needs. They just don’t want to split it. I told them it is a lot better to have multiple universe then have a single one. They said that it will be a lot easier to maintain a single universe and end users will want to find everything in one place rather than searching though universes and which one is used what.

    The only downside I can think of a single universe is that, there will be some unrelated tables for sure because let’s say finance data and HR data may not be related. This may cause a cartesian products if the end user chooses objects from both of these parts. If we are talking about end users, this might cause big problems.

    What might be the advantages and the disadvantages of having a single universe or multiple universe?

    Thanks a lot. Have good one.

    (0) 
    1. Andreas J.A. Schneider

      For starters:
      Why not put all LOB’s into one, merging HR with AP, AR, Sales etc. and have only one manager in the whole company? Would save money and everything is accessible from ONE place… Sounds awesome to me πŸ˜‰
      *** Well, I am just kidding obviously, but the same applies to universes.

      Universes should be built around subject areas, mimicking your business organization. Is HR data of any relevance to AP/AR for example? Is HR data not very sensitive data, which must be secured?

      1. How complex would a huge universe be to maintain?
        Consider changing/adding universe contexts, changing objects and impact analysis, maintaining global filters/forced joins/stub-joins, etc.
      2. Is a huge universe performancewise good?
        Filesize might become an an issue, a UNV/UNX has to be transferred from FRS for usage, caching universes might help though after first access.
      3. What about securing data (row and column level security)?
        This will be different from subject area to subject area. I do recommend Virtual Private Database (VPD) as available by Oracle anytime over row-level security implemented in the universe by the way.
      4. What about creating invalid queries by mixing universe objects in your reports that do not belong together such as HR data with Sales/Marketing data?
        The business views available with UNX to create subsets of the UNX might be of help here though to limit users to a subset of objects ina  big universe..
      5. Do you really have ALL data in ONE DWH?
        Most often not, at least not in my experience. And a multi-source UNX is just a virtual datamart with all the cons and pros of virtualization.

      I dearly suggest, do create separate universe for the separate subject areas.

      And depending on how your organization is structured, let BI IT for HR maintain the HR universe, BI IT for Finance maintain the Finance universe(s), etc.

      (0) 
      1. Mister Makmerphy

        In general I agree with you, but what if we’re dealing with banking system where one set of data is a subset of another and you cannot just remove the set from logic by creating new universe. There might be cases where huge object sets are better than few universes.

        General rule here is to spend many days/months and design maximally de-normalized dimension model and by doing this at east performance part will be solved efficiently πŸ™‚ .

        Also, to Joshua…

        what approach you choose when you see incompatible objects in Universe? I know that the best bet is to create DWH without them, but there might be cases where we cannot avoid them

        (0) 
        1. Joshua Fletcher Post author

          Incompatible objects (so separate subject areas that aren’t able to be joined) can be managed by using contexts and aggregate navigation.

          The end result is that the user gets an ‘incompatible objects’ message when they try to create a query which crosses the multiple areas.

          This would need to be enhanced with user training about how the universe works specifically.

          Josh

          (0) 
    2. Joshua Fletcher Post author

      Hi Zahid,

      Thanks for your question, and also thanks to Andreas J A Schneider for commenting – as usual, I agree with everything he has said and wouldn’t be able to add much for to his comment.

      Multiple universes are definitely the way to go, just be careful not to overdo it and have multiple universes for each LoB that are all slightly different but mostly the same – or even a universe per report – this is a maintenance nightmare.

      But one universe overall is not realistically achievable without causing a lot of end-user difficulties.

      Kind regards,

      Josh

      (0) 
      1. Zahid Yener

        Thanks Joshua for you comments.

        I agree with you on multi universes. We will have multiple data marts in our environment which all of them will be in one place. Before I tell customer about pros and cons of a single universe, I wanted to get some opinion on which method would be the best. Of course, we are not going to be building so many universes in the system. We will probably be creating 4-5 universes top.

        Thanks again.

        (0) 
  2. Rodrigo Caparroz

    Hi Joshua, great post!

    Do you see any recommendation or limit related to a maximum number of tables defined on a data infrastructure on IDT?

    A customer is maintaining only one Infrastructure for many business layers and with that have one Infrastructure with more than 800 tables… Do you know or maybe experienced any recommendation related to a maximum number of tables per infrastructure?

    I’m going to recommend to split it in at least a universe per subject (not per fact table) for a better maintenance but would like more guidance of what they can find as issue in the future.

    Best Regards,

    Rodrigo.

    (0) 

Leave a Reply