Data Extraction from Data Lake & Amazon Redshift Using SAP Data Services
In today’s information management landscape it is increasingly important to have a standardized method of data integration or data ingestion as well as data extraction from desperate data sources.
SAP Data Services with its various in-built adapters and connectivity options comes up as an ideal tool to achieve the desired outcomes.
This article will outline how we can connect to a Data Lake in AWS Environment and extract data from the same to on-premise.
Main Part :
- SAP Data Services 4.2
- Amazon Redshift ODBC Driver
How To Implement The Solution :
Step 1 : Install Amazon Redshift ODBC driver locally and configure ODBC with the AWS Redshift database details and test connectivity.
Step 2 : Install the ODBC driver on the SAP Data Services Job Server and configure the DSN or ODBC with the same name and credentials as shared. It would be similar to Step 1
Step 3: Create a Datastore Type as “Database” with Database Type as “ODBC”. Open the datastore and check “External Metadata”. Please ensure that necessary permissions are provided at user level in AWS to access the database schema to be used.
Step 4 : Create a job with the AWS object as source and execute as below
Around 6 mins for a 1 million+ records for a 1:1 extraction without any transformations
Thus we can see SAP Data Services can play a very important role in data integration with cloud solutions like data lake in AWS.