Re-posted from my original blog post in 2009 from Geek2Live.net
I recently had to provide a list of best practices in Universe Design – partly compiled from the SAP BusinessObjects training material, as well as my own experiences. I believe that the universe is the foundation of the BusinessObjects system. With poor universe design, required reports are often difficult or even impossible in some cases.
Poor universe design can also lead to wrong data, non-optimal SQL generation, incompatible objects and/or Cartesian products (or very wrong data).
I’ve categorised the best practices into two groups, End-User and Cosmetic methods, as well as Functional or Optimisation methods.
End-User / Cosmetic
- The universe should be created from the end-user perspective, not from a developer’s perspective. This should be kept in mind whether designing new universes, or modifying existing universes.
- All metadata (class names, object names and descriptions) should use business terminology. This allows the end users to easily use the universe without understanding the underlying database structure. It also speeds up development time, as header cells do not need to be edited. Users can then also search the universe when building a report to quickly find an object. For example, instead of calling a dimension object ‘Commit Cuml’, it should be called ‘Cumulative Commitment’.
- All development objects should be removed before promoting to Production. Universes should not contain old, legacy or development objects (and classes) – these should be removed before promoting to Production. If absolutely necessary, these should be hidden and placed in a ‘Legacy Objects’ class so no mistake can be made regarding their status.
- Objects should not be both dimensions and measures unless absolutely necessary. This is confusing to the user, and unnecessary. Providing row-level data is possible by introducing a unique object, like the ID.
- Object formatting should always be specified, especially with measures. Formatting like dates, currency and so forth is important to provide objects that the user doesn’t need to format themselves.
- Date fields should be expanded using SQL functions, and split into separate objects like Calendar Year, Calendar Month, Calendar Month Name etc. A library of date functions for both T-SQL and PL-SQL should be built up and re-used where possible.
- Classes and objects should be based on user requirements, not table-centric. Classes might contain objects from many different tables, and should not be based on the columns within a table.
- Custom hierarchies should always be defined and done according to the end-user’s requirements. This prevents users from drilling using default hierarchies, and allows control over what users can drill into.
- Pre-defined conditions should be used to accelerate report development and encode difficult query filters into an object that the user can use easily.
- Object names should be unique, and never repeated, regardless of the object classification.
Functional / Optimisation
- SQL aggregation functions should always be present in measures. All measures should have both a SQL aggregation function (ie SUM, COUNT and AVERAGE), then a projection aggregation function assigned (respectively SUM, SUM and NONE). This ensures that GROUP BYs are used and the processing power of the database is leveraged, returning fewer rows quicker. This also has additional performance benefits at the data provider level, where less RAM will be used on the server, and lower network traffic of data.
- Tables should not be unconnected. This will generate a Cartesian product for the end-user, and leads to very confusing universes. If tables have no way of being joined at all, they should be placed in separate universes.
- Left outer joins should be avoided where possible. This is sometimes impossible with transactional schemas, but left outer joins lead to NULLs and slower SQL queries. If left outer joins are necessary, universe objects should contain ISNULL functions or similar so that users aren’t presented with NULL values when using objects.
- Cardinalities on joins should always be set to allow for loop and SQL trap detection, this includes self-joins that should be marked as 1-1. Always set cardinalities manually, not automatically.
- Contexts should be used, and if too complex, the universe should be split up. This allows for path navigation and allows a universe to have more functionality.
- Loops and SQL traps should always be resolved to avoid Cartesian products and incorrect data. This is a very important component of universe design, especially with transactional schemas.
- Integrity checks and refreshes of the database structure should be done regularly to ensure the universe has the up-to-date view of the database.
- Views should be used in preference to derived tables, and kept to a minimum. This ensures optimal SQL generation.
- When aliasing tables, the original should be placed in the top left of the schema window and noted with ‘Aliased Table – Do Not Remove’. This will ensure that an original table isn’t removed by mistake, which will also remove all aliases of the table as well.
- The query limits should always be set, to ensure that runaway queries or Cartesian product queries don’t bring down the database. Both length of query time and number of rows should be set. The limits will differ per database.
- Multiple SQL statements for each measure should be turned off, to generate optimum SQL.
- Index awareness should be used where possible, to generate optimal SQL.
- List of values should be disabled for all dimension and detail objects that don’t require LOVs, to avoid users doing queries like all IDs for a given table.
I received quite a few comments on additional best practices, so I’ve included them here.
- Lock the universe when editing to prevent other users from editing the same universe. This can be controlled with the security model, so only the user who locked the universe can unlock it. However, beware that using the Import Wizard still allows you to overwrite a universe that is locked (Jansi).
- Use universe parameters to control how the SQL is generated. You can control SQL generation so it is ANSI-92 compliant, as well as use JOIN_BY_SQL to allow separate sub-queries to be combined at the query level instead of in the report (Andreas).
- All ratios should be set as database delegated/smart measures. This ensures that the ratio isn’t calculated incorrectly (ie average of an average) and is calculated back at the database (Andreas).
- Generic time objects for Today, Yesterday, This Month, Last Month etc. should be provided (as dimension objects as well as filters). These dimension objects can be used in filtering objects (Some Date Object = Today Object). Using filters may depend on the number of dates present in your data, however you could still create filters for the most common date objects (Andreas).
- As a rule of thumb do not use the WHERE clause in any measure, dimension, or detail object. Use CASE WHEN … SQL syntax instead. Combining two objects that have WHERE clauses filtering the same column(s) will return zero results, which the user won’t expect (Andreas).
- I also recently stumbled across the same kind of post over at the Business Intelligence blog. There are some differences in our lists, so you may wish to refer to that post as well.