Dataset linking is a feature that was introduced in Lumira 1.27. It allows the analyst to quickly correlate data from multiple datasets without predetermining how and what is required to establish a link. In fact, the workflow in Lumira is designed to allow the user to first think about what common attributes by which to aggregate the measure before deciding what dimension to link on. For instance, you want to see if there is any correlation between regional revenue and census data, such as average household income and other demographic attributes. You can easily discover these insights once the data is broken down by region in a visualization in Lumira.
The flexibility to link datasets within Lumira is especially powerful with the new enhancements that have been added to 1.28. The following scenarios are my attempt to showcase these new capabilities and to explore cases that require adjustments to the links between the datasets to derive the intended result.
Let’s start by summarizing what is possible with Dataset Linking in 1.27 and what new enhancements are now available in the 1.28 release:
- Link two datasets
- Link acquired datasets (not available for Hana Online)
- Define links once per story
- Create links that are left outer join
- Detect duplicate values as a result of partial link
- Consume or modify dataset links in Lumira server for BIP and Team Server
- Linked dimension must be used in the visualization
- Select from left outer, full outer, exception or inner join when linking datasets
- Link three or more datasets
- Add secondary dimensions in the visualization
- Available for HANA Online connection
In the scenarios below, I used the following three datasets. You can copy and paste this content into Lumira to follow along or use any supported data source, including your BW queries and HANA views:
|Order Date||Product||Retailer||Number of orders|
Scenario 1: Full Outer Join
Goal: compare my regional sales figures with the global sales figure
Start by importing Dataset 1 into Lumira (You can copy each dataset from this blog to your clipboard and import it to Lumira using the Copy from Clipboard connection option. Choose delimited by “Tab” for the content to be identified correctly).
Select crosstab as the visualization and place Region in Rows feeder and the Sales measure in the Measures feeder
Import Dataset 2 and add the Global Sales measure into the Measures feeder
You are prompted to link the datasets. Link the Dataset 1.Year dimension with the Dataset 2.Year dimension.
Lumira prompts you to add the linked dimension to the visualization. Click OK to allow the tool to add Year to the Rows feeder.
Now the crosstab displays a breakdown of Sales and Global sales by Region and Year.
Notice how the values of Global Sales are repeating for 2013 and 2014. This is because this measure can not be broken down in a unique manner by Region.
You can link additional common dimensions to achieve the desired granularity, but since this is not possible with the sample datasets that I have provided, I will remove Region from the visualization to remove the duplication.
To include 2015 Sales figures from Dataset 2 into the visualization, expand the Datasets in Use drop down and change the join type from Left outer to Full outer join.
You are now able to correlate regional sales with the Global sales side by side.
Scenario 2: Linking three datasets and Inner Join
Goal: determine the Number of orders vs. the regional and Global Sales
Add Dataset 3.
Place the Number of orders measure in the Measure feeder.
Once again, you are prompted to link the datasets. In the Dataset Linking dialog, link Dataset 1.Product dimension with Dataset 3.Product dimension.
Again , you are prompted to include Product dimension in your visualization. Click OK to continue.
Change Dataset 3 link to Inner join to remove duplicating Global Sales values.
You are now able to display measures from three different datasets in a single visualization.
Scenario 3: Secondary dimensions
Goal: calculate the Year over Year running sum of Orders by Retailer
Place the Retailer dimension from Dataset 3 in the Rows feeder of the crosstab.
Since the Retailer dimension is not from the primary dataset (Dataset 1) and is not a linked dimension, SAP Lumira rolls up its members to avoid showing duplicate values for Sales and Global Sales.
To see the Retailer breakdown, select “Show Multiple Matching Members” from the Setting cog menu
You can see that only certain Retailers have more than one order per Product.
Since I’m interested in the Year over Year Number of orders, we need to adjust the links with Dataset 3 to include the Order date to better improve the granularity of the link.
First, we will enrich the Order date to generate a date hierarchy. This process generates a year dimension in Dataset 3 that we can use to link on.
Adjust the link between Dataset 1 and Dataset 3 to include Year (From the Data menu, select Linking Datasets, then in the Linking Datasets dialog, insure that Dataset 3 is selected in the right panel)
Notice how the inner join on Year and Product adjusted the result to include only 3 Retailers per Year and Product rather than 4 and 5 respectively.
Create a running total of Number of orders per Year by opening the Running calculations dialogue box from the Calculator icon.
In the running Calculation dialog, select Number of orders as the measure, Sum as the Running Kind, and Year as the dimension to reset at.
Notice how the running sum accumulates and resets per Year.
If on the other hand, if you want to see the running total of all the orders in Dataset 3, then change the link of Dataset 3 from Inner Join to Full outer join.
Now the Number of orders displays as 8 for 2013 and as 6 for 2014.
In the image above I deselecting the Show Multiple Matching members from the Setting Cog for a more concise result; however, this is optional and the result of the running sum will still be the same per Year.
Keep an eye out for additional blogs on this topic as I will shortly share additional enhancements to Dataset linking that are planned for the 1.29 release.