Skip to Content
Author's profile photo Former Member

Combining Multiple sources with SAP Lumira

I have seen quite a few blogs in this channel and showing the capabilities of Lumira on discovering insights with a single data set. Fantastic job by many of the members.

With this blog i wanted to show the capabilities of SAP Lumira in how you can combine multiple data sources and visualize them.

For the usecase i choose the dataset from grouplens project (1 million rows, 12 columns)

There are three data files and a readme.txt

Picture1.png

Now the contents in the file, from “readme.txt”

users.csv – UserID::Gender::Age::Occupation::Zip-code
ratings.csv – UserID::MovieID::Rating::Timestamp
movies.csv – MovieID::Title::Genres

Lets now go ahead and acquire these files in SAP Lumira

After launching i choose new document

Picture2.png

I first acquires the ratings.csv file and choose only the relevant columns

Picture3.png

Then i rename the columns to appropriate names as the original csv files don’t have header information.

Picture4.png

Then i repeat the same steps of acquisition and renaming the columns for the rest of the two files

users.csv

Picture5.png

movies.csv

Picture6.png

Now i am done acquiring all the 3 files (ranking, movies & users) into SAP Lumira

Picture7.png

Now we will merge these files.

Rules of merge as per documentation

  • The merging dataset must have a key column.
  • Only columns with the same data type are considered.
  • The merge adds all columns.

So i go ahead and choose rankings.csv as my base file and then first add users data

In the prepare room i choose combine as –> Merge

Picture8.png

and choose users.csv and define the mapping as User ID.

Picture9.png

Now i have merged these two datasets

Picture10.png

I will now repeat the same thing for movies.csv file and define the mapping as movie ID and now i have a single data set which is created out of the multiple sources.

I will now create a measure on Rankings and do an aggregation on count (all)

Picture13.png

Picture14.png

In the next step you will see that Genre Column contains data of multiple genres separated by a Pipe (|) symbol. so we will use the SAP Lumira’s manipulation capabilities to create a unique Genre values from this.

For this, we will select the column and on the right side, choose split and enter “|” as the delimiter and execute it. you will now see two additional columns added.

Picture15.png

Picture16.png

Now we get into the fun part. you could create different visualizations based on the combined dataset.

I got into creating #ratings by Age Group, by Gender, by Genre, Top 25 movies with a filter on Genre

Picture18.png

Picture19.png

Picture20.png

Picture21.png

Then i created a story with the Top template and loaded these visualizations. I also added a story board filter on Genre

Picture22.png

Then I published this to SAP Lumira cloud

Picture23.pngPicture24.png

To summarize, you can use SAP Lumira to acquire, combine multiple sources of data and manipulate or enrich the data with SAP Lumira and then create visualizations and stories and share with your colleagues.

Please let us know if you have used this feature in SAP Lumira and what’s your feedback.

Best Regards

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Angus Menter
      Angus Menter

      Hi Viswanathan I woul dlike to demonstrate this to a prospect - can you supply the source files please?

      Regards

      Angus

      Author's profile photo Angus Menter
      Angus Menter

      Hi Viswanathan i've got the data from the grouplens website. Thanks.

      Author's profile photo Former Member
      Former Member

      Hi Viswanathan,  it is great that we can load data from multiple sources.  Can the source data be a mix of universes, sql queries and cvs files?

      thanks,