Skip to Content

Dataset linking is a feature that was introduced in Lumira 1.27.  It allows the analyst to quickly correlate data from multiple datasets without predetermining how and what is required to establish a link. In fact, the workflow in Lumira is designed to allow the user to first think about what common attributes by which to aggregate the measure before deciding what dimension to link on. For instance, you want to see if there is any correlation between regional revenue and census data, such as average household income and other demographic attributes.  You can easily discover these insights once the data is broken down by region in a visualization in Lumira.

The flexibility to link datasets within Lumira is especially powerful with the new enhancements that have been added to 1.28.   The following scenarios are my attempt to showcase these new capabilities and to explore cases that require adjustments to the links between the datasets to derive the intended result.


Let’s start by summarizing what is possible with Dataset Linking in 1.27 and what new enhancements are now available in the 1.28 release:

Lumira 1.27
  • Link two datasets
  • Link acquired datasets (not available for Hana Online)
  • Define links once per story
  • Create links that are left outer join
  • Detect duplicate values as a result of partial link
  • Consume or modify dataset links in Lumira server for BIP and Team Server
  • Linked dimension must be used in the visualization
Lumira 1.28
  • Select from left outer, full outer, exception or inner join when linking datasets
  • Link three or more datasets
  • Add secondary dimensions in the visualization
  • Available for HANA Online connection

In the scenarios below, I used the following three datasets.  You can copy and paste this content into Lumira to follow along or use any supported data source, including your BW queries and HANA views:

Dataset 1:

Region Product Year Sales
Canada A 2013 70
Canada B 2014 90
USA A 2013 100
USA B 2014 50
USA C 2014 60

Dataset 2:


Year Quarter Global Sales
2013 Q1 250
2013 Q2 0
2013 Q3 300
2013 Q4 350
2014 Q1 500
2014 Q2 0
2014 Q3 400
2014 Q4 700
2015 Q1 400
2015 Q2 700

Dataset 3:

Order Date Product Retailer Number of orders
6/5/2013 B Shop8 1
6/5/2013 A Shop8 1
6/5/2013 B Shop4 1
6/5/2013 A Shop6 1
6/5/2013 B Shop5 1
6/6/2013 A Shop3 1
6/7/2013 B Shop6 1
6/7/2013 B Shop5 1
1/1/2014 B Shop2 1
1/1/2014 B Shop4 1
1/1/2014 A Shop6 1
1/3/2014 A Shop5 1
1/2/2014 B Shop5 1
1/2/2014 A Shop5 1
Scenario 1: Full Outer Join
Goal: compare my regional sales figures with the global sales figure

Start by importing Dataset 1 into Lumira (You can copy each dataset from this blog to your clipboard and import it to Lumira using the Copy from Clipboard connection option.   Choose delimited by “Tab” for the content to be identified correctly).

dataset 1.JPG

Select crosstab as the visualization and place Region in Rows feeder and the Sales measure in the Measures feeder

Sales by region.JPG

Import Dataset 2 and add the Global Sales measure into the Measures feeder

dataset 2 global sales.jpg

You are prompted to link the datasets.  Link the Dataset 1.Year dimension with the Dataset 2.Year dimension.

linking ds1 and ds2.jpg

Lumira prompts you to add the linked dimension to the visualization.  Click OK to allow the tool to add Year to the Rows feeder.

linking prompt to add year.jpg

Now the crosstab displays a breakdown of Sales and Global sales by Region and Year.

sales by region and year.jpg

Notice how the values of Global Sales are repeating for 2013 and 2014.  This is because this measure can not be broken down in a unique manner by Region.

You can link additional common dimensions to achieve the desired granularity, but since this is not possible with the sample datasets that I have provided, I will remove Region from the visualization to remove the duplication.

sales by year.jpg

To include 2015 Sales figures from Dataset 2 into the visualization, expand the Datasets in Use drop down and change the join type from Left outer to Full outer join.

selecting fullouter join.jpg

You are now able to correlate regional sales with the Global sales side by side.

including 2015.jpg

Scenario 2: Linking three datasets and Inner Join
Goal: determine the Number of orders vs. the regional and Global Sales

Add Dataset 3.

add dataset 3 a.jpg

Place the Number of orders measure in the Measure feeder.

add number of orders.jpg

Once again, you are prompted to link the datasets.  In the Dataset Linking dialog, link Dataset 1.Product dimension with Dataset 3.Product dimension.

linking dataset 3.jpg

Again , you are prompted to include Product dimension in your visualization.  Click OK to continue.

Change Dataset 3 link to Inner join to remove duplicating Global Sales values.

inner join on dataset 3.jpg

You are now able to display measures from three different datasets in a single visualization.

final step scenario 2.jpg

Scenario 3: Secondary dimensions
Goal: calculate the Year over Year running sum of Orders by Retailer

Place the Retailer dimension from Dataset 3 in the Rows feeder of the crosstab.

select Retailer.jpg

Since the Retailer dimension is not from the primary dataset (Dataset 1) and is not a linked dimension, SAP Lumira rolls up its members to avoid showing duplicate values for Sales and Global Sales.

multi matching 1.jpg

To see the Retailer breakdown, select “Show Multiple Matching Members” from the Setting cog menu

select show multi members.jpg

You can see that only certain Retailers have more than one order per Product.

show multiple retailer results.jpg

Since I’m interested in the Year over Year Number of orders, we need to adjust the links with Dataset 3 to include the Order date to better improve the granularity of the link.

First, we will enrich the Order date to generate a date hierarchy.  This process generates a year dimension in Dataset 3 that we can use to link on.

create date hierarchy.jpg

Adjust the link between Dataset 1 and Dataset 3 to include Year (From the Data menu, select Linking Datasets, then in the Linking Datasets dialog, insure that Dataset 3 is selected in the right panel)

dataset 3 link product year.jpg

Notice how the inner join on Year and Product adjusted the result to include only 3 Retailers per Year and Product rather than 4 and 5 respectively.

result after linking on product year dataset3.jpg

Create a running total of Number of orders per Year by opening the Running calculations dialogue box from the Calculator icon.

Running calc.jpg

In the running Calculation dialog, select Number of orders as the measure, Sum as the Running Kind, and Year as the dimension to reset at.

running dialog.jpg

Notice how the running sum accumulates and resets per Year.

running result.jpg

If on the other hand, if you want to see the running total of all the orders in Dataset 3, then change the link of  Dataset 3 from Inner Join to Full outer join.

all the orders are accounted for.jpg

Now the Number of orders displays as 8 for 2013 and as 6 for 2014.

In the image above I deselecting the Show Multiple Matching members from the Setting Cog for a more concise result; however, this is optional and the result of the running sum will still be the same per Year.

Keep an eye out for additional blogs on this topic as I will shortly share additional enhancements to Dataset linking that are planned for the 1.29 release.

To report this post you need to login first.

11 Comments

You must be Logged on to comment or reply to a post.

    1. Mike Seblani Post author

      The linked dimensions that you insert into a visualization dictate which of the links are utilized by Lumira.  This means that you have some limited control of what links are active between the datasets that are used in your visualization.  This is particularly helpful if you have several common dimensions between datasets.

      Also in the Dataset Linking Dialog, it is possible to display how the secondary datasets link with the primary dataset using the drop down list on the right panel.  You can cycle through all available datasets to see how they are linked.

      In 1.29, we plan to allow the user full control over defining the active linked dimensions across the datasets.  The Dataset in use feature in the Chart feeder is where Lumira displays the linking context per visualization

      (0) 
      1. Naveen Kumar Ketha

        Thanks Mike . I thinking cycling through all datasets in dataset linking dialog to check how they are linked is not an easy process. It’s nice to have one snapshot through  data model window that shows all of the links/joins between datasets. I hope it comes in future releases.

        (0) 
  1. Ty Miller

    Really cool Mike. Thank you for sharing this. I was playing with Dataset Linking on Lumira Server for BI Platform with data coming from a Universe as well as acquired from HANA…all from within the browser. Fun stuff!

    (0) 
  2. Sharon Om

    Fantastic Blending capability especially for HANA ONLINE.  Thanks Mike for detailing this out as it is not so obvious how powerful our latest offering of Blending is.

    Thanks to our customers who helped us to bang our head against the wall with your interesting workflows. Yes, you know who you are 😆 .  It was good fun!!

    (0) 
  3. Donatas Budrys

    Hi, Mike,

    Your post about Blending features in SAP Lumira is just outstanding. Will share it with others.

    Thank you and keep us posting.

    Best regards,

    Donatas Budrys

    (0) 
  4. Karri Linnoinen

    Nice, looking forward to this.

    How will this dataset linking functionality be supported with the BI Platform version of Lumira? Or how about the Lumira Cloud?

    /Karri

    (0) 
  5. Josh Crawford

    Thanks for the post, Mike… very helpful.  However, can you please clarify or elaborate on the assertion that Dataset Linking can be performed with data retrieved through a Hana Online connection?

    I’m on Lumira 1.28.4.  If I have pulled data via Hana Online connection, I cannot add any other datasets that are not also from that same Hana Online connection.  The online documentation clearly confirms this limitation as well.  Did I miss a caveat in your post or am I simply misunderstanding something?

    Thanks,

    Josh

    (0) 
    1. Mike Seblani Post author

      Hi Josh,

      Dataset Linking for HANA online connection is possible between views that are coming from the same HANA system. 

      In order to link HANA views with content outside of the HANA system, you will need to either acquire the data from the HANA system into Lumira then blend it with other content such as a spreadsheet.  Alternatively and this is more scalable, you can publish from the Share room the content acquired and wrangled through Lumira into HANA.  Now all the content are available on the same HANA system and can be blended in Lumira.

      thanks,

      Mike

      (0) 

Leave a Reply