Using SAP Lumira Extensions to Make a Free World
The SAP Analytics team has once again set up a cool Viz-a-Thon event on the outskirts of SAP TechEd 2015 in Las Vegas. This was a great opportunity to learn or apply Lumira skill sets for a good cause. This year, Made in a Free World is the non-profit organization that was willing to open its data books and allow the 50-some volunteers to visualize the risk score in 2 datasets: jeans and tablet computers. Within the (very short) 90 minutes timeframe, our “Alexa’s Super Awesome Team” was able to put together a compelling story (more here about on the name).
We received lots of comments on our usage of the Sankey Diagram custom extension. So I’ll detail here how we achieved it in such a short timeframe so that you, too, can leverage it within your organization.
For your convenience, you can find the source CSV data file, the codebook, the extension and the final Lumira file here.
Sankey Diagrams
First of all, let’s give credit where credit is due: Dong Pan was one of the Data Geniuses supporting the teams during the event. We reused his extension that is already well documented here: Lumira Visualization Extension – Sankey Diagram
According to Wikipedia, “Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. Sankey diagrams are typically used to visualize energy or material or cost transfers between processes.”
Example from Ouseful.Info
Activate the Sankey Diagram Extension
This step used to be very complicated. From Lumira 1.28 on, the extensions are handled very nicely. In GitHub or in the link above, download the Sankey Diagram Extension for SAP Lumira (pan.viz.ext.sankey.zip).`
Open SAP Lumira. Under File -> Extensions, click on Manual Installation. Locate your file. Restart Lumira. Et voilà!
Prepare: the Made in a Free World dataset
In order to reduce the risk of employing forced labor in an organization’s supply chain, Made in a Free World has developed a risk score.
Download the attached sample dataset for tablet computers. In Lumira, click on File -> New, select Text (csv) and locate your CSV file. Take a look at the data. You may also want to take a look at the well documented CodeBook. In particular, notice how the “path” column is, in fact, a tree combined with the “depth” column. The “score” column indicates the risk and is a number between 0 and 1.
For the Sankey Diagram, we need to decompose the path and only keep the last component as the “Target” and the second-to-last as the “Source”. Ideally, this would be available in separate columns, but it’s not the case in this particular dataset. Therefore, we’ll have to extract it.
In order to make our life easier, we’ll focus on the level 2 and 3 of the hierarchy. Still in the “prepare” tab, click on the gear next to the “depth” column and select Filter. Make sure only levels 2 and 3 are selected.
Select the “path” column. At the bottom right, click on “Split” and enter “:#:”. Notice that Lumira will create additional columns “path(2)” and “path(3)”. The path(2) column now contains the first level of components. The path(3) still contains multiple levels. Reproduce the splitting ad lib.
Since the network doesn’t always have the same number of levels, you will end up with half-filled columns.
In order to calculate the Source and Target, we only need to set a priority in the columns. This is easily achieved with calculated dimensions. Click on the gear next to any column and select “Create Calculated Dimension”.
Give it a name (e.g. Source) and enter a formula like this one:
if (IsNotNull({path (5)})) then {path (4)} else {path (2)}
Reproduce the same for the Target calculated dimension.
if IsNotNull({path (5)}) then {path (5)} else {path (4)}
Now that we’re done with the Prepare step, let’s Visualize!
Visualize the Sankey Diagram
In the Visualize window, click on the “Chart Extensions” button and choose “Sankey Diagram”.
In the “Measures” section, choose “Score” as a value. In the “Dimensions” section, select “Source” and “Target”.
The result should look something like this:
Compose and Conclusion
In the Compose section, you may include your visualization with some images and texts. Our result looked like this.
What is interesting in this particular dataset / visualization dataset is that you will notice some sub-components that are shared between the main components. This is the case for “Integrated Circuits”, “Resistors” and “Capacitors”. A more traditional drill-down or tree option may not have been able to show this particular relationship.
Your turn, now. Go ahead and give it a try. Use the comments section to share how you did with the provided dataset or with your own.
Great work but the following link doesn't work:
For your convenience, you can find the source CSV data file, the codebook, the extension and the final Lumira file here.
You're right, Robert, I'm sorry about it.
I have updated the link. Is it working for you now?
Perfect, thanks!
The link is not working for me anymore 🙁