The purpose for this post is to familiarize myself more with the terms and concepts around multidimensional databases and data warehouses, this post is part of a series that will talk more about business intelligence and will be an introduction for upcoming posts about SAP technologies in general and the SAP Business Intelligence technology stack.
The multidimensional data model emerged as it was found out that the previously wide spread model (the relational model) was efficient is performing "online" transactions but did not perform well when the objective was to analyze data (especially on large scale). So, the multidimensional model was designed specifically to express support for data analysis and is the core component in the Business Intelligence (BI) industry.
The multidimensional model presents data as facts with associated numerical values called measures or dimensions that characterize the facts and they are "mostly" textual. Queries on the model usually aggregate measure values over a range of dimension values to produce results such as sale per month and city. Multidimensional models have three important application areas, they are:
Dimensions are used for three purposes; the labeling, selection(filtering) and the grouping of data at a desired level of detail (they are often textual and considered as labels for data). A dimension is organized into a containment-like hierarchy composed of number of levels, each of which represents a specific level of details. The instances of the dimensions are typically called dimension values or members; each value or member belongs to a particular level. for example, in the figure above T,State and City are dimension levels; each level may have a number of properties; each property is associated with a simple non-hierarchical information i.e. City -> population. Boston, Florida ... etc. are all dimension values or members.The dimension hierarchy is defined in the meta-data of the cube (the data structure in multidimensional models that we will discuss later); by doing that we will avoid duplicated definitions of the hierarchy afterwards. In some cases we might have defined multiple hierarchy definitions for example for Fiscal and Calendar year. Most models require that dimensions form a balanced tree so that it will have uniform weight everywhere however this is not the case always.A conformed dimension is a set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation. A conformed dimension cuts across many facts. Dimensions are conformed when they are either exactly the same (including keys) or one is a perfect subset of the other
A measure has two components, a numerical property and a formula (usually an aggregation function such as sum or average). Measures generally represent the properties of a chosen fact. There are three classes of measures:
Facts are the objects that present the subject of the analysis. They are mostly defined by their combination of dimension values. a fact has a certain granularity which is determined by the levels from which its dimension values are drawn. There are mainly two types of facts:
Event facts are made to represent events that can happen unpredictably in time; however snapshot facts are typically made to represent the states at predefined times.
A cube is a multidimensional data structure for capturing and analyzing data. It generalizes the tabular spreadsheet such as there can be any number of dimensions (in contrast to only two in the tabular spreadsheets). The terms hypercube and cube are used interchangeably as sometimes the term cube can infer that it hold only three dimensions. A collection of related cubes is commonly referred to as a multidimensional database or multidimensional data warehouse.A cube consists of a set of cells at each of the dimensions intersections; a non empty cell is a fact. A fact has a number of measures associated with it. A sparse cube is one with high percentage of empty cells; a dense cube is one with high percentage of facts. Cubes tend to become increasingly sparse with increasing dimensionality and with increasingly finer granularities of the dimension value.
A pivot table is a two dimensional table of data with associated subtotals and totals. It may also allow the user to use hierarchies to drill down or roll up. It can be also nested into several dimensions on one axis. It can be also pivoted such as the dimensions can be rotated (swapping x and y).
Dedicated querying languages that offers a number of ranking and sorting function on multidimensional data
There are two principal ways of representing dimensions:
A star schema has one dimension table for each dimension, this table has a key column and one column for each level of the dimension (except T). Furthermore, a star schema has a Fact Table that hold a row for each multidimensional fact, it has a column for each dimension. They primary key in the dimension tables is typically a surrogate key (ID), this will result in better storage, prevention of key-reuse problems and more efficient query processing. However, we notice that there is redundancy in higher level data, however it is not a problem space-wise and it does not cause any update problems thus it is often good idea to use such redundant dimensions tables to optimize performance.
It is much similar to the star schema, however it contains several dimension tables for each dimension; this will result in removing the redundancy found in star schemas. The result of this is that querying the schema is now harder since several joins must be applied; it is also more time consuming to compute the results.
The support for analysis and querying on cubes is done using these operations: