Technical Articles
Integrating SAP HANA Data Lake to Google Big Query – DL2BQ
Hi All ,
Note: I was/am exploring the different kind of integration possibilities under the cloud & how can we provide the max automation in this Area hence this is a small piece of work I am presenting now & will keep exploring. Thanks.
I am writing this blogpost to share the new developed python library which is going to help us in migrating data from SAP HANA Data Lake to Big query , I am working on this problem since long that how we can establish a very smooth integration between HANA Data Lake & Big Query , but I am done now with the development & first release is ready for installation & also added my code in GitRepo.
Python Library hdltobq – https://pypi.org/project/hdltobq/
Source code – https://github.com/shivamshukla12/dl2bq
A Simple Architecture:
Pre-requisites: You must have your btp trial account up and running & data lake instance should also be running & have your credentials also ready for an open database connectivity
You should also have your gcp trial account ready – & make sure you have downloaded the gcp credentials in json format locally in your system.
Mainly both the cloud accounts should be up and running.
Data Lake Instance:
GCP Instance & Big Query:
- Now go to your python prompt and install the library
pip install hdltobq
- If installation is successful then you will be able to import it
import hdltobq
- After Installation try these imports if that’s fine then all good to go
##Import below libraries... import hdltobq from hdltobq.hdltobq import BQConnect
- Methods for connecting to GCP BQ, Creating tables, Creating datasets & transporting contents
Sample Inputs ###You should have your project & credentials ready for migrating data from Data Lake to BQ bq_dataset = 'bigquery-public-data:hacker_news' ## Your BQ Dataset if created else create one bq_project = 'igneous-study-316208' ### This is Mandatory bq_credentials = r'C:\Users\ABC\Downloads\igneous-study-316208-d66aebfd83ea.json' ##Mandt ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) ##Initialize BQ bq = BQConnect(bq_dataset,bq_project,bq_credentials) bq_client, bq_ds = BQConnect.connect2bq(bq)
- Create Dataset
###Create new Dataset for your tables first. lv_ab = BQConnect.create_dataset(bq_client,'HANADL') Output Creating DataSet..... Created.. Thanks
- Create Table
-
### Create table ... BQConnect.create_tab(bq_client, df, 'HOTEL') Ouput: Started Creating table..... igneous-study-316208.HANADL.HOTEL Preparing Schema... Ready..... CRITICAL:root:Dataset igneous-study-316208.HANADL.HOTEL already exists, not creating.
- Finally to transport data to BQ
####Command for BQ Insert df.to_gbq('HANADL.HOTEL',project_id=bq_client.project,if_exists='append')
Data Preview from Data Lake
GCP BQ Output
- So here we come to an end where we have successfully transferred data from SAP HANA Data Lake to Big Query , Probably we will see the transfer from Big Query to SAP HANA Data Lake in next post – till then take care & Keep learning.
PS :Finally I am adding a small demo video of my work . thanks
PS: Please don’t forget to share your valuable feedback or any use case in mind for implementation or try.
This is interesting and appreciate the effort you put into this and looking forward to integrate Big Query with On-Prem HANA or S4 HANA (if possible Without SLT & BODS)
I think remote server connector is already there in SAP HANA to connect to BQ..
Thanks,
Shivam
Hello Shivam Shukla
you really want to loose the control about your data and load (sensitive) content into Hyperscaler like Google.
Discover what the SAP IQ Database can do for you - SAP (Sybase) IQ – the hidden treasure …
Best Regards Roland
Hi Roland Kramer
Yes intention was not to expose any information or sensitive data , I was exploring BigQuery and got the idea of integrating it with data lake , I am also thinking to moving data from multiple sources to DL (Data Lake) for Analytics/Insights.
Thanks,
Shivam
Hello Shivam Shukla
I'm not talking about SAP HANA Datalake, this is SAP IQ Database on-premise.
I fact, SAP HANA Datalake is nothing else, than the SAP Implementation of SAP IQ Multiplex in Amazon.
Especially when I comes to sensitive Data, Hyperscaler or Cloud Solutions on Hyperscaler's cannot guarantee that you still the owner or is is the Data Center in the Country.
Replace the Azure Hyperscaler Implementation with on-premise or your own location and you are done.
Best Regards Roland
Hi Roland Kramer
Yeah right got your point now - So we can leverage all the standard SAP IQ functionality like DI/DS (Analytics) without transporting the data out.
I will explore SAP IQ and DL in detail just to understand how powerful they are in terms of Auto Machine learning as this is the only thing I see now in technology if we can process the data inside the database give the insights back to customer.
thanks for pointing this out and guiding us.
Thanks,
Shivam