Skip to Content

Now that SAP Lumira 1.29 is available to our customers, I would like to share with you additional uses cases that are now possible using the power of Dataset Linking (blending).


If this is your first time looking into Dataset linking, I recommend that you have a quick read through my original blog on this topic where I covered the functionality introduced in Lumira 1.27 and 1.28.  For the exercises below, you can use the sample dataset that ships with Lumira desktop called BestRunCorp_Retail.  In addition I’ve attached a spreadsheet to this blog that needs to be renamed to CA Sales.xlsx.  This should give us the baseline to mimic the blending scenarios in Lumira.


First let’s start with a summary of enhancements to Dataset Linking in the 1.29 release:

  • Suggest possible dimensions to link and blend on
  • Browse through a sample of values in Dataset Linking dialog
  • Set the linking context with “Active Linked Dimensions”
  • Use “Calculate Result As” to control the roll up of measure values
  • Display the union list of values when filtering linked datasets

Scenario 1: Linking the common dimensions and Exclude Active Linked Dimensions
Goal: Compare my Canada’s sales figures with the United States sales figure

Start by opening the sample dataset BestRunCorp_Retial.xlsx by clicking on the “Try with Samples” button in the Home page of Lumira (Hint: you will need to close any open document to get to the Home page).

try sample1.jpg

In the Add New Dataset dialog, click on Create to import the dataset and start a new document

import bestruncorp.jpg

In the Visualize room, click on crosstab to start a crosstab visualization.  Add the Sales Revenue to the Measures and City to the Rows of the chart feeder.

Next, click on the Add Filter button in the filter bar and select Country.  Include USA and click Apply.

include USA in Viz.jpg

To compare Canadian cities values with USA values, we need to blend with the dataset attached to this article.  If you haven’t done so already, download the attachment CA Sales.txt and rename it to CA Sales.xlsx.

Acquire the data of CA Sales.xlsx

inmport ca sales.jpg

Once you are back in the Visualize room, CA Sales dimensions and measures should be available in the Object Picker on the left panel to select from.  Now we are able to drag the CA Revenue measure into the existing crosstab’s Measure feeder to display side by side with the USA Sales Revenue.

However at this stage, we are prompted to define a link between these datasets.  In Lumira 1.29, you can now hover over the different dimensions from the left or the right dataset to browse through sample values to help identify common dimensions with similar data.  I do notice that Category and City appear to be in both datasets but we will only select Category for the time being since we know that there are no common cities between these datasets as Canadian data is not present in BestRunCorpRetail dataset.

dataset linking sample values.jpg

Once Category is selected, a light bulb appear on the bottom right field.  When we click on the light bulb, 2 dimensions are suggested as potential common dimensions.  We will select Category with 74% match between these datasets rather than the 3% with the Line dimension.dataset linking suggested common dimensions.jpg

Click OK to continue

link on category.jpg

Now the CA – Revenue values are shown side by side with Sales Revenue of the US cities.

Notice how Lumira adds a warning beside CA – Revenue.  If you hover over this warning, a message is given letting the user know that the measure values are duplicated across the city rows. In the example below, I’m sorting the results on Sales Revenue.

Now that Lumira 1.29 allows users to link on dimensions that are not included in the visualization, the tool will need to roll up the values based on the context of the visualization which in this case is grouped by City and linked using Left outer join on Category.

Since we filtered the visualization originally to only include the country USA, the tool is rolling up the CA – Revenue values for the common categories that are sold to all the Canadian cities and displaying the value 17,695… for every US city.

duplicate CA sales values.jpg

We need to link on additional dimensions to achieve the right granular level in order to display the City Revenue for Canada and US side by side.

From the Dataset selector, click on Link Datasets to display the Dataset Linking dialog.

link dataset button.jpg

Link on City even though there are no common values between them

link on city.jpg

Insure that the City is selected within the Datasets In Use, Active Linked Dimensions feeder.  You may need to expand the Dataset In Use feeder and click on the + button within the Active Linked Dimensions to add City to the visualization linking context.  The purpose of the Active Linked Dimensions feeder is to allow us to adjust the linking context for this visualization to reach the proper level of aggregation across the datasets.  Note: this is typically not an issue in the case of Merge since Merge forces the user to select a unique key.

city in active link dimension.jpg

Since we know that the cities are not common, change the link on the CA Sales dataset from Left outer to Full Outer join.  This is need to display the cities available in both datasets.

change the link to full outer.jpg

Notice how we are still only seeing US cities.  This is due to the fact that our filter is only including US cities.  Once a visualization includes a blend of datasets, the filters impact the result set of the visualization.  In this particular scenario, this is not what we want.  We could filter on City and select every Canadian and US city to achieve the proper selection but this prone to user error.

Ideally we can simply select USA and Canada but since the list of values of the Country dimension in BestRunCorpRetail does not include Canada,  we will exclude all the countries except for US.  This would return US and Canadian cities from the blended datasets.

To do so, click on the existing Country filter token and remove the USA selection.  Next change the operator to Not in List and Shift select all the values within the Country dimension.  Finally scroll to the bottom of the list of values and unselect USA to insure that it is not excluded.

exclude all except USA.jpg

Now the crosstab allows us to display the results of both countries side by side.  Removing the sort on Sales Revenue will display the Cities of both countries in alphabetical order.

USA and CA cities.jpg

Scenario 2: Filter using the union list of values
Goal: Filter the visualization to Toronto and New York city

I’m interested in digging little deeper into the revenue values of Toronto and New York city.  From the Add Filter button in the filter toolbar, select City and include Toronto and New York.  When filtering, it doesn’t matter which dataset you select City from since linked dimensions by default are combined into one list of values.  In the case where the join type is Full outer join, a union of the list of values is done by default.  This explains why Lumira provided us with Canadian and US cities in one list.

filter toronto and new york.jpg

If you are only interested in selecting values from the primary dataset, you can disable the Show Values From Linked Datasets within the cog menu of the Filter dialog.  This toggle is only relevant for linked dimensions that are set to full outer join

show values from linked datasets.jpg

The following visualization answers the business question we are after.

toronto new your viz.jpg

Alternatively , we can swap the Category dimension with the City dimension in the rows chart feeder to analyze by category the revenue figures of Toronto and New York.

toronto new your by category.jpg

Scenario 3: Calculate Results As
Goal: Manipulate the rollup operation

Remove the Category dimension from the rows of the chart feeder and replace back with City.


The totals shown for Revenue represent the rollup of all the categories of New York and Toronto since Category is part of the linking context.  This rollup operation is done at the resultset level and not at the fact level/dataset level.  Therefore it is an aggregation on top of the original (Sum) aggregation of Revenue defined at the dataset level. 


It is possible in Lumira 1.29 to manipulate this rollup operation where it makes sense.  This can be done by selecting the Calculate Total As options from the measure cog within the Measure chart feeder.  Also this option is only visible when Lumira is forced to do a rollup as a result of dataset linking.


Select the Calculate Total As, Count(Distinct)  for CA – Revenue.

select cal total as.jpg

The number 25 represents the 25 categories sold in Toronto.

count distinct.jpg

Select the Calculate Total As, Average  for CA – Revenue.

rollup as average.jpg

Now the value shown for Toronto is the average Revenue of all the categories sold in Toronto which translates to << Sum{Revenue}/25>>.  This average is not the same as the average Revenue per City which equates to << Sum {Revenue} /  # of records where {City} = “Toronto”>>. 

average toronto new york.jpg

Note: please be careful how and when average is used as it is processed at the result set in the example above and not at the fact level.


I know that the examples above are not straight forward but I wanted to show how powerful Dataset Linking could be especially when the datasets being blended do not align very well.  Also I hope the point is made that Dataset linking and blending whether it is in Lumira or other tools is a feature and a concept that can get complex very quickly and not many end users can grasp it easily.  This where as an analyst and content creator, you would want to use the Merge feature of Lumira where possible to hide some of these complexity from your end users. 

To report this post you need to login first.

3 Comments

You must be Logged on to comment or reply to a post.

  1. Chuck Vigeant

    While I appreciate the ability to do some complex things, I am somewhat frustrated about doing a simple thing.

    We have a star schema in a Sybase SQLAnywhere database, with subjects containing a singular FACT table and multiple DIMENSION tables, using inner joins.   This is traditional data warehousing 101  (The product is called QQube from CLEARIFY®)

    I tried to link A FACT table with a single DIMENSION table, and the data will not appear correctly on the canvas.

    Using a single measure – Sales Amount, with a DIMENSION field of Item Type: it shows ONE column of total sales, with the correct information of 8 Members (e.g. item types).  However it SHOULD be showing 8 different columns (sales per item type).  Is this a bug? or just not able to handle this simple type of join.

    (0) 
  2. Chandrashekhara Kotrappa

    Hi Mike,

    I have a follow up question on this Data set linking option. If I wanted to compare YOY sales data I can get them side by side in a chart with month/quarter axis. Do you how can I create a calculated measure involving difference of sales amount from current year data set to previous year data set. Is it possible with Lumira, I know we can do it easily in webi report. TIA for any input you could provide.

    Thanks,
    CK

    (0) 
  3. Anastasia Gladenko

    Hi Mike,

    Have you ever had a problem with datasets linking in Lumira? I have a situation where records are duplicated in one dataset because the values in the linking dataset changed (values were renamed in data file and then uploaded into SAP Lumira). I cannot remove link and cannot remove duplicates.

    Thanks

    Anastasia

    (0) 

Leave a Reply