Integrating SAP HANA Data Lake to Google Big Query – DL2BQ
Hi All ,
Note: I was/am exploring the different kind of integration possibilities under the cloud & how can we provide the max automation in this Area hence this is a small piece of work I am presenting now & will keep exploring. Thanks.
I am writing this blogpost to share the new developed python library which is going to help us in migrating data from SAP HANA Data Lake to Big query , I am working on this problem since long that how we can establish a very smooth integration between HANA Data Lake & Big Query , but I am done now with the development & first release is ready for installation & also added my code in GitRepo.
Python Library hdltobq – https://pypi.org/project/hdltobq/
Source code – https://github.com/shivamshukla12/dl2bq
A Simple Architecture:
Pre-requisites: You must have your btp trial account up and running & data lake instance should also be running & have your credentials also ready for an open database connectivity
You should also have your gcp trial account ready – & make sure you have downloaded the gcp credentials in json format locally in your system.
Mainly both the cloud accounts should be up and running.
Data Lake Instance:
GCP Instance & Big Query:
- Now go to your python prompt and install the library
pip install hdltobq
- If installation is successful then you will be able to import it
- After Installation try these imports if that’s fine then all good to go
##Import below libraries... import hdltobq from hdltobq.hdltobq import BQConnect
- Methods for connecting to GCP BQ, Creating tables, Creating datasets & transporting contents
Sample Inputs ###You should have your project & credentials ready for migrating data from Data Lake to BQ bq_dataset = 'bigquery-public-data:hacker_news' ## Your BQ Dataset if created else create one bq_project = 'igneous-study-316208' ### This is Mandatory bq_credentials = r'C:\Users\ABC\Downloads\igneous-study-316208-d66aebfd83ea.json' ##Mandt ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) bq_client, bq_ds = BQConnect.connect2bq(bq)
- Create Dataset
###Create new Dataset for your tables first. lv_ab = BQConnect.create_dataset(bq_client,'HANADL') Output Creating DataSet..... Created.. Thanks
- Create Table
### Create table ... BQConnect.create_tab(bq_client, df, 'HOTEL') Ouput: Started Creating table..... igneous-study-316208.HANADL.HOTEL Preparing Schema... Ready..... CRITICAL:root:Dataset igneous-study-316208.HANADL.HOTEL already exists, not creating.
- Finally to transport data to BQ
####Command for BQ Insert df.to_gbq('HANADL.HOTEL',project_id=bq_client.project,if_exists='append')
Data Preview from Data Lake
GCP BQ Output
- So here we come to an end where we have successfully transferred data from SAP HANA Data Lake to Big Query , Probably we will see the transfer from Big Query to SAP HANA Data Lake in next post – till then take care & Keep learning.
PS :Finally I am adding a small demo video of my work . thanks
PS: Please don’t forget to share your valuable feedback or any use case in mind for implementation or try.