Hi All ,
Note: I was/am exploring the different kind of integration possibilities under the cloud & how can we provide the max automation in this Area hence this is a small piece of work I am presenting now & will keep exploring. Thanks.
I am writing this blogpost to share the new developed python library which is going to help us in migrating data from SAP HANA Data Lake to Big query , I am working on this problem since long that how we can establish a very smooth integration between HANA Data Lake & Big Query , but I am done now with the development & first release is ready for installation & also added my code in GitRepo.
Python Library hdltobq - https://pypi.org/project/hdltobq/
Source code - https://github.com/shivamshukla12/dl2bq
A Simple Architecture:
Pre-requisites: You must have your btp trial account up and running & data lake instance should also be running & have your credentials also ready for an open database connectivity
You should also have your gcp trial account ready - & make sure you have downloaded the gcp credentials in json format locally in your system.
Mainly both the cloud accounts should be up and running.
Data Lake Instance:
GCP Instance & Big Query:
pip install hdltobq
import hdltobq
##Import below libraries... import hdltobq from hdltobq.hdltobq import BQConnect
Sample Inputs ###You should have your project & credentials ready for migrating data from Data Lake to BQ bq_dataset = 'bigquery-public-data:hacker_news' ## Your BQ Dataset if created else create one bq_project = 'igneous-study-316208' ### This is Mandatory bq_credentials = r'C:\Users\ABC\Downloads\igneous-study-316208-d66aebfd83ea.json' ##Mandt ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) bq_client, bq_ds = BQConnect.connect2bq(bq)
###Create new Dataset for your tables first. lv_ab = BQConnect.create_dataset(bq_client,'HANADL') Output Creating DataSet..... Created.. Thanks
### Create table ... BQConnect.create_tab(bq_client, df, 'HOTEL') Ouput: Started Creating table..... igneous-study-316208.HANADL.HOTEL Preparing Schema... Ready..... CRITICAL:root:Dataset igneous-study-316208.HANADL.HOTEL already exists, not creating.
####Command for BQ Insert df.to_gbq('HANADL.HOTEL',project_id=bq_client.project,if_exists='append')
Data Preview from Data Lake
GCP BQ Output
PS: Please don't forget to share your valuable feedback or any use case in mind for implementation or try.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
36 | |
25 | |
17 | |
13 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 |