Harnessing Multi-Model Capabilities with Spotify – Processing Semi-Structured Data with SAP HANA Cloud/SAP Datasphere – Part 1
This is going to be a series of blogs explaining the multi-model capabilities of SAP HANA Cloud /SAP Datasphere with one scenario. So grab a coffee before you go through this series of architecture, solutions, implementation with code repositories, and finally an extended version of the series as an SAP Discovery mission(to be published).
We have published discovery missions for SAP Business Technology Platform(SAP BTP) adoptions focusing on SAP HANA Cloud and SAP Datasphere based on customer use cases. When the use case is specific, it may or may not cover all the latest features that were released as part of our platform. So, we planned this time to cover holistically and use an interesting scenario that might spark your interest to explore all the multi-model capabilities [hopefully!]. Let’s not restrict ourselves to BTP components and also discuss deployment options using hyperscaler services/open source frameworks.
If you are interested in the blog for hands-on directly, you can check here
Spotify is one of the most successful audio streaming and media services providers that focuses on the real “Customer Experience” or rather say customer obsession. They make the customer experience as personal as possible(at least from my experience 😊 ). The important reasons are
- Spotify is a prime example of microservices-based architecture that supports and scales for more than 205 million premium users and 295 million active users.
- It has a vibrant developer community and provides interesting APIs and packages for different technological deployments (Web APIs /Spotipy / Spotify R)
- Their focus on research areas such as Audio Intelligence, Human-Computer Interaction,
Algorithmic Responsibility, User modeling, and corresponding publications tell us how serious
they are about customer experience as well as the music industry.
- Finally, we wanted to consider a scenario in which our SAP community with different expertise levels can relate to, learn, and adopt the concepts in their own area of expertise.
Now with this introduction on the Spotify Developer platform, let’s get into the scenario on how we plan to use SAP HANA Cloud / SAP Datasphere to consume their APIs and process the semi-structure data or JSON.
So.. what’s the Scenario
- Spotify releases Top charts for every country and it is available as a weekly or daily playlist. So, let’s consider the weekly playlist for say 10 countries.
- Using Spotify APIs, we will retrieve the Top chart weekly for 10 countries and store it as JSON in HANA Cloud. Using identifiers, we will extract attributes such as track name, artist name, album name, and popularity.
- For all these song details that was extracted in step 2, we will use Spotify audio features API to extract metrics such as danceability, speechiness, liveness . We will explain the metrics in the later steps. And all these metrics will be stored as JSON too.
- Create a SQL view merging both the JSON collections acquired in step 2 & 3 .
- Based on the SQL views, we will create Calculation views to understand which songs from different playlists have higher danceability and energy features . We will perform R computations to group by Country Playlist and understand the features. The metrics can now be consumed via SAP Analytics Cloud or other tools such as Microsoft Power BI.
We will explain additional scenarios in the architecture which will be covered as part of Discovery Missions.
Now the Architecture – SAP HANA Cloud
Consider this architecture for consuming Spotify APIs with SAP HANA Cloud & let me explain it in detail. Also, I will refer to the numbers mentioned in the architecture diagram while explaining the scenarios.
Spotify provides Web APIs to consume public playlists, tracks, artists, albums, and podcasts and extract audio features for all the tracks. In order to consume these APIs, I will use Python and the Spotipy package. They already have shared enough sample code snippets on how to use authentication, and call APIs for all scenarios. I just put the pieces together to have the valid JSON structure that we need to store as a collection in SAP HANA Cloud. And further added separate functions to loop around playlists, capture audio features of every track in every playlist and store them in a separate collection.
We can discuss the deployment options that we can consider while ingesting JSON documents in SAP HANA Cloud. Here are the options
Data Ingestion – Option 1
For validating if the ingestion works fine, you can execute the scripts[2.1] directly from Google Colab /Visual studio or any python environment of your choice. Also, the code can be executed from a VM/ any compute which could be used for data extraction/ingestion. Basically, you will be calling the APIs using Spotipy libraries and using hana_ml library to insert the captured JSON documents as collections .
Data Ingestion – Option 2
Data Ingestion – Option 3
And of course, you could think about different scenarios for data ingestion using Integration Suite or SAP Data Intelligence.
Once you have decided on the deployment options (2.1-2.3), all the responses from Spotify APIs will be stored as JSON documents in HANA Cloud. They will be stored as Collections under a schema. If you are scheduling it using SAP Kyma or GCP App Engine, you can organize the name of the JSON document as per date or week before ingesting using hana_ml package.
Now that we have the top songs and audio features from all play charts by country, we can create calculation views and consumes the data using SAC. We can include R computations with the server provided in SAC or by connecting the local R server. Or you could consume the data using Microsoft Power BI too . I am using it since there are some direct features to display images on the fly from a table.
There are other scenarios that we can consider after step 4, ingestion of API responses in HANA Cloud. Consider this architecture flow 4 -> 8 -> 9. Once we have the data ingested, we can compare the Playlist for Top Charts with your own private playlists using Graph Engine. We can compare collections of both and see what are the common artists, and songs you have in common and have the corresponding metrics displayed using based on SAP Cloud Application Programming or using SAP Build. Or we can expose the Graph-based computational data through python based web framework, Django.
What about the Architecture with SAP Datasphere?
The architecture flow is almost the same as SAP HANA Cloud except for some minor changes. I would just explain the differences and the steps that you need to do from SAP HANA Cloud. As you see in the below architecture, SAP Datasphere does not enable the Document Store capabilities (as of now) to store data as JSON Collection. Instead, we will be ingesting the data as a large object within the HANA Cloud tenant of SAP Datasphere. We do have the standard JSON to SQL function which we can apply on the large object and extract the response. And the result would be the same as how we extracted from the SAP HANA Cloud JSON collection. Or we can normalize the JSON response externally and ingest it in SAP Datasphere as relational(similar to standard/partner connectors).
Here are the steps to be followed for SAP Datasphere:
- Consume the Spotify APIs using python script directly using the hdbcli package , or normalize the response externally using the hana_ml package[2.2].
- Either options 2 or 2.2 could be containerized using SAP Kyma and scheduled for periodic ingestion[2.1].
- We will be using the python package hdcli to ingest the data as a large object NCLOB.
- Create an Open SQL schema access from an SAP Datasphere space .
- Ingest the JSON response as data type “NCLOB” in a table 
- In order to ingest the data, we can use SAP Kyma[2.1] or use python script  directly from any VM/Compute.
- For comparing Top Chart playlists with your private playlist, we can utilize the graph engine as we did for SAP HANA Cloud. For the detailed development process on how to build based on the underlying HC tenant, please refer to my blog.
- You can use the Data Builder feature to blend data from different playlists 
- You can either expose the data to SAP SAC, Non SAP reporting tools
Also, you can expose the same data for building apps using SAP CAP, SAP Build, or the external web-based framework such as Django
Once you go through one of the scenarios say reporting based on top tracks of different countries, you will be able to compare the Danceability metrics(R computation)and visualize the data in SAP SAC as seen below. The below visualization compares top 500 songs from 10 different countries based on Danceability metric – overall regularity of a song based on tempo, rhythm, stability and beat strength.. you feeling it ? 🙂
And here is another metric Speechiness that is compared across all tracks grouped by Country as below [R computations using Spotify R package]. All these code that I used for Spotify R are already mentioned in the github repositories by different open source developers. I just modified according to these specific scenarios and will add the references in my blogs focusing on hands-on.
And here is the visualization for the same in Power BI. Here we compare the same metrics danceability and energy for all tracks from 10 different countries. Also Power BI has a cool feature of displaying images on the fly and part of table . And yeah Taylor Swift’s new album is damn cool ! Still feeling it ?
What will you learn?
Based on the kind of deployment you test or try hands-on, you will learn some or all of the topics mentioned below 😊
- SAP HANA Cloud Basics
- HANA Multi-model capabilities [Document Store / Graph / Modeling / CAP]
- Development using Business Application Studio
- Python Basics & Development using Visual Studio
- Docker Set up and configuration and Docker hub basics
- SAP BTP Kyma Deployments
- GCP basics and App Engine Deployments
- SAP Datasphere basics and Integration with HDI
- SAP Build Integration with SAP HANA Cloud
- SAP Analytics Cloud Introduction and Basic Story Board Developments
- Programming based on Django Framework
We went through different architecture options using SAP HANA Cloud/ SAP Datasphere. Hope it provides you a high-level overview of how to process the semi-structured data. Now let’s focus on the implementation part. In the following blogs, I will be discussing consuming semi-structured data in SAP HANA Cloud / SAP Datasphere and utilizing it for reporting. We will also cover scenarios focusing on Graph Engine comparing public and private playlists and consuming them it in apps based on SAP CAP or SAP Build Apps. And finally embedding web players in Django with SAP HANA Cloud as the backend and extracting metrics based on songs played.
Please do let us know your feedback. If the architecture and deployment options interests you, kindly continue with part 2 where discuss the implementation based on SAP HANA Cloud . And if you are looking for access to SAP HANA Cloud, you can sign up individually using SAP free tier offerings. In order to enable document store in your SAP HC instance, you would need at least 3 vCPUs and there is a restriction of 2 vCPUs in free tier services. Kindly suggest you to compare the pricing model as mentioned in our Discovery center.
Happy Learning !