Dataset Linking in SAP Lumira
Dataset linking is a feature that was introduced in Lumira 1.27. It allows the analyst to quickly correlate data from multiple datasets without predetermining how and what is required to establish a link. In fact, the workflow in Lumira is designed to allow the user to first think about what common attributes by which to aggregate the measure before deciding what dimension to link on. For instance, you want to see if there is any correlation between regional revenue and census data, such as average household income and other demographic attributes. You can easily discover these insights once the data is broken down by region in a visualization in Lumira.
The flexibility to link datasets within Lumira is especially powerful with the new enhancements that have been added to 1.28. The following scenarios are my attempt to showcase these new capabilities and to explore cases that require adjustments to the links between the datasets to derive the intended result.
Let’s start by summarizing what is possible with Dataset Linking in 1.27 and what new enhancements are now available in the 1.28 release:
Lumira 1.27
- Link two datasets
- Link acquired datasets (not available for Hana Online)
- Define links once per story
- Create links that are left outer join
- Detect duplicate values as a result of partial link
- Consume or modify dataset links in Lumira server for BIP and Team Server
- Linked dimension must be used in the visualization
Lumira 1.28
- Select from left outer, full outer, exception or inner join when linking datasets
- Link three or more datasets
- Add secondary dimensions in the visualization
- Available for HANA Online connection
In the scenarios below, I used the following three datasets. You can copy and paste this content into Lumira to follow along or use any supported data source, including your BW queries and HANA views:
Dataset 1:
Region | Product | Year | Sales |
Canada | A | 2013 | 70 |
Canada | B | 2014 | 90 |
USA | A | 2013 | 100 |
USA | B | 2014 | 50 |
USA | C | 2014 | 60 |
Dataset 2:
Year | Quarter | Global Sales |
2013 | Q1 | 250 |
2013 | Q2 | 0 |
2013 | Q3 | 300 |
2013 | Q4 | 350 |
2014 | Q1 | 500 |
2014 | Q2 | 0 |
2014 | Q3 | 400 |
2014 | Q4 | 700 |
2015 | Q1 | 400 |
2015 | Q2 | 700 |
Dataset 3:
Order Date | Product | Retailer | Number of orders |
6/5/2013 | B | Shop8 | 1 |
6/5/2013 | A | Shop8 | 1 |
6/5/2013 | B | Shop4 | 1 |
6/5/2013 | A | Shop6 | 1 |
6/5/2013 | B | Shop5 | 1 |
6/6/2013 | A | Shop3 | 1 |
6/7/2013 | B | Shop6 | 1 |
6/7/2013 | B | Shop5 | 1 |
1/1/2014 | B | Shop2 | 1 |
1/1/2014 | B | Shop4 | 1 |
1/1/2014 | A | Shop6 | 1 |
1/3/2014 | A | Shop5 | 1 |
1/2/2014 | B | Shop5 | 1 |
1/2/2014 | A | Shop5 | 1 |
Scenario 1: Full Outer Join
Goal: compare my regional sales figures with the global sales figure
Start by importing Dataset 1 into Lumira (You can copy each dataset from this blog to your clipboard and import it to Lumira using the Copy from Clipboard connection option. Choose delimited by “Tab” for the content to be identified correctly).
Select crosstab as the visualization and place Region in Rows feeder and the Sales measure in the Measures feeder
Import Dataset 2 and add the Global Sales measure into the Measures feeder
You are prompted to link the datasets. Link the Dataset 1.Year dimension with the Dataset 2.Year dimension.
Lumira prompts you to add the linked dimension to the visualization. Click OK to allow the tool to add Year to the Rows feeder.
Now the crosstab displays a breakdown of Sales and Global sales by Region and Year.
Notice how the values of Global Sales are repeating for 2013 and 2014. This is because this measure can not be broken down in a unique manner by Region.
You can link additional common dimensions to achieve the desired granularity, but since this is not possible with the sample datasets that I have provided, I will remove Region from the visualization to remove the duplication.
To include 2015 Sales figures from Dataset 2 into the visualization, expand the Datasets in Use drop down and change the join type from Left outer to Full outer join.
You are now able to correlate regional sales with the Global sales side by side.
Scenario 2: Linking three datasets and Inner Join
Goal: determine the Number of orders vs. the regional and Global Sales
Add Dataset 3.
Place the Number of orders measure in the Measure feeder.
Once again, you are prompted to link the datasets. In the Dataset Linking dialog, link Dataset 1.Product dimension with Dataset 3.Product dimension.
Again , you are prompted to include Product dimension in your visualization. Click OK to continue.
Change Dataset 3 link to Inner join to remove duplicating Global Sales values.
You are now able to display measures from three different datasets in a single visualization.
Scenario 3: Secondary dimensions
Goal: calculate the Year over Year running sum of Orders by Retailer
Place the Retailer dimension from Dataset 3 in the Rows feeder of the crosstab.
Since the Retailer dimension is not from the primary dataset (Dataset 1) and is not a linked dimension, SAP Lumira rolls up its members to avoid showing duplicate values for Sales and Global Sales.
To see the Retailer breakdown, select “Show Multiple Matching Members” from the Setting cog menu
You can see that only certain Retailers have more than one order per Product.
Since I’m interested in the Year over Year Number of orders, we need to adjust the links with Dataset 3 to include the Order date to better improve the granularity of the link.
First, we will enrich the Order date to generate a date hierarchy. This process generates a year dimension in Dataset 3 that we can use to link on.
Adjust the link between Dataset 1 and Dataset 3 to include Year (From the Data menu, select Linking Datasets, then in the Linking Datasets dialog, insure that Dataset 3 is selected in the right panel)
Notice how the inner join on Year and Product adjusted the result to include only 3 Retailers per Year and Product rather than 4 and 5 respectively.
Create a running total of Number of orders per Year by opening the Running calculations dialogue box from the Calculator icon.
In the running Calculation dialog, select Number of orders as the measure, Sum as the Running Kind, and Year as the dimension to reset at.
Notice how the running sum accumulates and resets per Year.
If on the other hand, if you want to see the running total of all the orders in Dataset 3, then change the link of Dataset 3 from Inner Join to Full outer join.
Now the Number of orders displays as 8 for 2013 and as 6 for 2014.
In the image above I deselecting the Show Multiple Matching members from the Setting Cog for a more concise result; however, this is optional and the result of the running sum will still be the same per Year.
Keep an eye out for additional blogs on this topic as I will shortly share additional enhancements to Dataset linking that are planned for the 1.29 release.
are there any plans to add data model viewer to easily visualize and change linking between different data sets?
The linked dimensions that you insert into a visualization dictate which of the links are utilized by Lumira. This means that you have some limited control of what links are active between the datasets that are used in your visualization. This is particularly helpful if you have several common dimensions between datasets.
Also in the Dataset Linking Dialog, it is possible to display how the secondary datasets link with the primary dataset using the drop down list on the right panel. You can cycle through all available datasets to see how they are linked.
In 1.29, we plan to allow the user full control over defining the active linked dimensions across the datasets. The Dataset in use feature in the Chart feeder is where Lumira displays the linking context per visualization
Thanks Mike . I thinking cycling through all datasets in dataset linking dialog to check how they are linked is not an easy process. It's nice to have one snapshot through data model window that shows all of the links/joins between datasets. I hope it comes in future releases.
Really cool Mike. Thank you for sharing this. I was playing with Dataset Linking on Lumira Server for BI Platform with data coming from a Universe as well as acquired from HANA...all from within the browser. Fun stuff!
Fantastic Blending capability especially for HANA ONLINE. Thanks Mike for detailing this out as it is not so obvious how powerful our latest offering of Blending is.
Thanks to our customers who helped us to bang our head against the wall with your interesting workflows. Yes, you know who you are 😆 . It was good fun!!
Hi, Mike,
Your post about Blending features in SAP Lumira is just outstanding. Will share it with others.
Thank you and keep us posting.
Best regards,
Donatas Budrys
Mike,
Excellent demo.
Refreshing to see a demo that you actually recreate 100% yourself.
Tony
Nice, looking forward to this.
How will this dataset linking functionality be supported with the BI Platform version of Lumira? Or how about the Lumira Cloud?
/Karri
Thanks for the post, Mike... very helpful. However, can you please clarify or elaborate on the assertion that Dataset Linking can be performed with data retrieved through a Hana Online connection?
I'm on Lumira 1.28.4. If I have pulled data via Hana Online connection, I cannot add any other datasets that are not also from that same Hana Online connection. The online documentation clearly confirms this limitation as well. Did I miss a caveat in your post or am I simply misunderstanding something?
Thanks,
Josh
Hi Josh,
Dataset Linking for HANA online connection is possible between views that are coming from the same HANA system.
In order to link HANA views with content outside of the HANA system, you will need to either acquire the data from the HANA system into Lumira then blend it with other content such as a spreadsheet. Alternatively and this is more scalable, you can publish from the Share room the content acquired and wrangled through Lumira into HANA. Now all the content are available on the same HANA system and can be blended in Lumira.
thanks,
Mike
Hi,
If you are interested in what you can do in Lumira 1.29 regarding Dataset Linking and Blending, please check out part 2 of this blog:
Dataset Linking in SAP Lumira - Part 2
thanks
Mike