Upon easing into my morning routine (Americano; cold milk; one sugar) I was fat slapped by the date. We are in the last quarter of 2013, fast approaching year end and the holiday season. Almost time for the ‘year in review’ series of articles which will receive front page glory.
Another task that never goes amiss and is somewhat habitual for us all, is my loyal affair with software. Starting up, logging in and after some work behind the wheel…logging off. That software is called BPC.
This has got me thinking, we spend so much time with the product, how many of us actually know its’ origins and the history behind the technology? To better understand the application software, let’s start with BPC’s story in commerce:
Corporation was a Stamford, Connecticut based software company. Along with
OutlookSoft, other vendors in the performance management space include SAS,
Cognos and Business Objects. In 2006, OutlookSoft was named in the top 25% in Deloitte’s
Technology Fast 500 and in the same year, was sued by competitor, Hyperion,
over software patents, winning a dismissal in October 2006 as the jury found no
patent infringement and subsequently ruled the patents ineligible for claim. In
June 2007, SAP AG announced its proposed acquisition of OutlookSoft as a part
of its strategy to challenge Oracle Corporation OutlookSoft’s largest product
was revamped as SAP BPC (Business Planning and Consolidation) after acquisition
by SAP AG.”
Thanks to the always accurate http://en.wikipedia.org/wiki/OutlookSoft and too, by the looks of some of the latest BPC sales wins, Hyperion certainly saw the writing on the wall.
So then, BPC was Outlooksoft, which really was an application that was developed to leverage Microsoft’s BI stack. This is where the history lesson begins. Business Intelligence in the Microsoft world can attribute its success to the concept of dimensional modeling. This is where we find ideas such as cubes, measures, dimensions and tables modeled to present end user’s with optimal performance when querying data. An acronym which embodies most of these elements is referred to as OLAP, and often used in tech jargon.
A pioneer in this field is Ralph Kimball, referred to the architect of data warehousing http://en.wikipedia.org/wiki/Ralph_Kimball . It is the Kimball approach to modeling which the broader audience uses in solution design for BPC. Below, we find the 10 Essential Rules of Dimensional Modeling according to Kimball University (http://www.kimballgroup.com/ ):
Rule #1: Load detailed atomic data into
models should be populated with bedrock atomic details to support the
unpredictable filtering and grouping required by business user queries. Users
typically don’t need to see a single record at a time, but you can’t predict
the somewhat arbitrary ways they’ll want to screen and roll up the details. If
only summarized data is available, then you’ve already made assumptions about
data usage patterns that will cause users to run into a brick wall when they
want to dig deeper into the details. Of course, atomic details can be
complemented by summary dimensional models that provide performance advantages
for common queries of aggregated data, but business users cannot live on
summary data alone; they need the gory details to answer their ever-changing
Rule #2: Structure dimensional models around
processes are the activities performed by your organization; they represent
measurement events, like taking an order or billing a customer. Business processes
typically capture or generate unique performance metrics associated with each
event. These metrics translate into facts, with each business process
represented by a single atomic fact table. In addition to single process fact
tables, consolidated fact tables are sometimes created that combine metrics
from multiple processes into one fact table at a common level of detail. Again,
consolidated fact tables are a complement to the detailed single-process fact
tables, not a substitute for them.
Rule #3: Ensure that every fact table has an
associated date dimension table.
measurement events described in Rule #2 always have a date stamp of some
variety associated with them, whether it’s a monthly balance snapshot or a
monetary transfer captured to the hundredth of a second. Every fact table
should have at least one foreign key to an associated date dimension table,
whose grain is a single day, with calendar attributes and nonstandard
characteristics about the measurement event date, such as the fiscal month and
corporate holiday indicator. Sometimes multiple date foreign keys are
represented in a fact table.
Rule #4: Ensure that all facts in a single fact
table are at the same grain or level of detail.
There are three fundamental grains to
categorize all fact tables: transactional, periodic snapshot, or accumulating
snapshot. Regardless of its grain type, every measurement within a fact table
must be at the exact same level of detail. When you mix facts representing
multiple levels of granularity in the same fact table, you are setting yourself
up for business user confusion and making the BI applications vulnerable to
overstated or otherwise erroneous results.
Resolve many-to-many relationships in fact tables.
fact table stores the results of a business process event, there’s inherently a
many-to-many (M:M) relationship between its foreign keys, such as multiple
products being sold in multiple stores on multiple days. These foreign key
fields should never be null. Sometimes dimensions can take on multiple values
for a single measurement event, such as the multiple diagnoses associated with
a health care encounter or multiple customers with a bank account. In these
cases, it’s unreasonable to resolve the many-valued dimensions directly in the fact
table, as this would violate the natural grain of the measurement event. Thus,
we use a many-to-many, dual-keyed bridge table in conjunction with the fact
#6: Resolve many-to-one relationships in dimension tables.
fixed-depth many-to-one (M:1) relationships between attributes are typically
denormalized or collapsed into a flattened dimension table. If you’ve spent
most of your career designing entity-relationship models for transaction
processing systems, you’ll need to resist your instinctive tendency to
normalize or snowflake a M:1 relationship into smaller subdimensions; dimension
denormalization is the name of the game in dimensional modeling.
It is relatively common to have multiple M:1
relationships represented in a single dimension table. One-to-one
relationships, like a unique product description associated with a product
code, are also handled in a dimension table. Occasionally many-to-one
relationships are resolved in the fact table, such as the case when the detailed
dimension table has millions of rows and its roll-up attributes are frequently
changing. However, using the fact table to resolve M:1 relationships should be
#7: Store report labels and filter domain values in dimension tables.
and, more importantly, associated decodes and descriptors used for labeling and
query filtering should be captured in dimension tables. Avoid storing cryptic
code fields or bulky descriptive fields in the fact table itself; likewise,
don’t just store the code in the dimension table and assume that users don’t
need descriptive decodes or that they’ll be handled in the BI application. If
it’s a row/column label or pull-down menu filter, then it should be handled as
a dimension attribute.
Though we stated in Rule #5 that fact table
foreign keys should never be null, it’s also advisable to avoid nulls in the
dimension tables’ attribute fields by replacing the null value with
“NA” (not applicable) or another default value, determined by the data
steward, to reduce user confusion if possible.
#8: Make certain that dimension tables use a surrogate key.
sequentially assigned surrogate keys (except for the date dimension, where
chronologically assigned and even more meaningful keys are acceptable) deliver
a number of operational benefits, including smaller keys which mean smaller
fact tables, smaller indexes, and improved performance. Surrogate keys are
absolutely required if you’re tracking dimension attribute changes with a new
dimension record for each profile change. Even if your business users don’t
initially visualize the value of tracking attribute changes, using surrogates
will make a downstream policy change less onerous. The surrogates also allow
you to map multiple operational keys to a common profile, plus buffer you from
unexpected operational activities, like the recycling of an obsolete product
number or acquisition of another company with its own coding schemes.
#9: Create conformed dimensions to integrate data across the enterprise.
dimensions (otherwise known as common, master, standard or reference
dimensions) are essential for enterprise data warehousing. Managed once in the
ETL system and then reused across multiple fact tables, conformed dimensions
deliver consistent descriptive attributes across dimensional models and support
the ability to drill across and integrate data from multiple business
processes. The Enterprise Data Warehouse Bus Matrix is the key architecture
blueprint for representing the organization’s core business processes and
associated dimensionality. Reusing conformed dimensions ultimately shortens the
time-to-market by eliminating redundant design and development efforts; however,
conformed dimensions require a commitment and investment in data stewardship
and governance, even if you don’t need everyone to agree on every dimension
attribute to leverage conformity.
#10: Continuously balance requirements and realities to deliver a DW/BI
solution that’s accepted by business users and that supports their
Dimensional modelers must constantly straddle
business user requirements along with the underlying realities of the
associated source data to deliver a design that can be implemented and that,
more importantly, stands a reasonable chance of business adoption. The
requirements-versus-realities balancing act is a fact of life for DW/BI
practitioners, whether you’re focused on the dimensional model, project strategy,
technical/ETL/BI architectures or deployment/maintenance plan