Technical Articles
Data Extraction from Data Lake & Amazon Redshift Using SAP Data Services
Introduction :
In today’s information management landscape it is increasingly important to have a standardized method of data integration or data ingestion as well as data extraction from desperate data sources.
SAP Data Services with its various in-built adapters and connectivity options comes up as an ideal tool to achieve the desired outcomes.
This article will outline how we can connect to a Data Lake in AWS Environment and extract data from the same to on-premise.
Main Part :
Pre-Requisites :
- SAP Data Services 4.2
- Amazon Redshift ODBC Driver
How To Implement The Solution :
Step 1 : Install Amazon Redshift ODBC driver locally and configure ODBC with the AWS Redshift database details and test connectivity.
Step 2 : Install the ODBC driver on the SAP Data Services Job Server and configure the DSN or ODBC with the same name and credentials as shared. It would be similar to Step 1
Step 3: Create a Datastore Type as “Database” with Database Type as “ODBC”. Open the datastore and check “External Metadata”. Please ensure that necessary permissions are provided at user level in AWS to access the database schema to be used.
Step 4 : Create a job with the AWS object as source and execute as below
Around 6 mins for a 1 million+ records for a 1:1 extraction without any transformations
Conclusion :
Thus we can see SAP Data Services can play a very important role in data integration with cloud solutions like data lake in AWS.
Very Informative .
Another way to do it - We can also use SDA with source as AWS which provide you data virtually then create flow graph on top of virtual table to design your flow for delta load. These flow graph later can be later schedule using BODS .
Amit k