Federated Analytics with SAP Datasphere : A DECATHLON Story
Federated Analytics is an architecture pattern that distributes queries directly to sources via the SAP Datasphere and helps to build richer live data analytics on SAP Analytics Cloud combining SAP and non-SAP data, thereby eliminating the need for data replication or duplication.
In this blog, we will walk you through a recent validation exercise for this architecture done at a SAP Strategic customer, Decathlon for their business use case.
Customer : Decathlon , A France based large sporting goods retailer.
Business Use Case and Motivation:
Decathlon’s challenges with current analytics solution include volume limit with data import and that data from their hyperscaler sources could not be brought in live on charts for doing better analytics. Decathlon has large amounts of sales and forecasted sales data split across several tables stored in Apache Parquet format in their Amazon S3 data lakes. The expected outcome from the proposed solution is elimination of the import challenges and bringing data live for doing better comparative analytics of forecasted and actual sales data on a story dashboard .
The Solution Architecture:
This use case serves to directly fit the Federated Analytics architecture where SAP Datasphere, SAP Analytics Cloud and Amazon Athena Services are integrated to create the analytics solution end to end.
Decathlon’s Amazon Athena housed their sports equipment forecasted-sales data as well as historical weekly actual sales data in tables and views (queried in real time directly against their several parquet files in Amazon S3 data lakes, roughly 40 million rows in all for this validation).
This Amazon Athena is connected with SAP Datasphere where remote tables are modeled to look up Athena data.
Analytical models created off of the remote tables are used for transforming, aggregating and projecting the data directly queried from Amazon Athena. The analytics story created in their external SAP Analytics Cloud tenant consumes these remote models to bring in the data for rich Visualizations.
Solution diagram showing data federation architecture between SAP Datasphere and Amazon Athena
The validation was executed by Decathlon’s business users and analytics users in a new trial SAP Datasphere tenant. This would be their first time working with SAP Datasphere.
With the help of our initial architectural guidance, support and information from SAP blogs and missions, the Analytics developers and business users at Decathlon were able to execute these phases in the PoC:
- Configuration on Amazon S3 and Amazon Athena to identify views/tables that needed to be queried
- Security policy configurations to allow integration from SAP Datasphere to query Athena
- Establishing trust in SAP Datasphere by configuring AWS CA Certs onto SAP Datasphere
- Creating Remote tables and Analytical models in SAP Datasphere
- Configuring live connection from SAP Analytics Cloud to SAP Datasphere
- Creating analytical dashboards in SAP Analytics Cloud
- Monitoring performance at SAP Datasphere and remote queries at Amazon Athena
Decathlon completed the entire end to end architecture validation starting from data source integration planning till the completion of the Analytics dashboard within a span of just 4 weeks and iterating it over the next 2 weeks to fine tune, monitor and do diagnostic observations.
The end-to-end SAP Analytics Cloud story showing several comparative sales analysis charts, all of them bringing live data through SAP Datasphere’s analytical models that federates queries directly to Amazon Athena in real time.
End-to-end individual Query Performance was diagnosed and optimized starting with the SAP Datasphere’s remote query monitor tool and tracing it to Amazon Athena helping review the data quality and applying optimizations to improve the performance.
At the end of the 6-week PoC here is a direct quote from the business users at Decathlon:
SAP Datasphere enabled us to increase our time to market, from idea to final story, by removing time consuming steps in data preparation.
We were able to consume data where this data is located without duplicated the source of information.
The process was simple and straight forward without even having any training in DWC and with the help of few documents and wiki’s, we were able to create a robust pipeline of information that creates rapid value for our users.
We see in Datasphere an extension of our SAC initiative that goes beyond our expectation.
In the upcoming weeks, we are going to explore other data sources and increase our experience with Datasphere and SAC.
The customer has thereon planned to expand their validation exercises to include their other use cases that involve data from sources, such as Amazon Redshift, SAP BW On HANA and Google BigQuery .
Art of the Possible:
Data federation architecture in SAP Datasphere can be leveraged to provide real time data access connecting to external hyperscaler sources such as Amazon Athena, Amazon Redshift, Google Big Query and Azure Data Explorer and combining it with business critical SAP data to deliver powerful insights , eliminating the need to duplicate any data.
For step by step guidance for implementing use cases, follow the SAP Discovery Mission