Skip to Content
Technical Articles
Author's profile photo Shivam Shukla

Integrating SAP HANA Data Lake to Google Big Query – DL2BQ

Hi All ,

Note: I was/am exploring the different kind of integration possibilities under the cloud & how can we provide the max automation in this Area hence this is a small piece of work I am presenting now & will keep exploring. Thanks.

I am writing this blogpost to share the new developed python library which is going to help us in migrating data from SAP HANA Data Lake to Big query , I am working on this problem since long that how we can establish a very smooth integration between HANA Data Lake & Big Query , but I am done now with the development & first release is ready for installation & also added my code in GitRepo.

Python Library hdltobqhttps://pypi.org/project/hdltobq/

Source code – https://github.com/shivamshukla12/dl2bq

 

A Simple Architecture:

 

Pre-requisites: You must have your btp trial account up and running & data lake instance should also be running & have your credentials also ready for an open database connectivity

You should also have your gcp trial account ready – & make sure you have downloaded the gcp credentials in json format locally in your system.

Mainly both the cloud accounts should be up and running.

 

Data Lake Instance:

 

GCP Instance & Big Query:

 

  • Now go to your python prompt and install the library

pip install hdltobq

  • If installation is successful then you will be able to import it
    import hdltobq​

 

  • After Installation try these imports if that’s fine then all good to go
    ##Import below libraries...
    
    import hdltobq
    from hdltobq.hdltobq import BQConnect​

 

  • Methods for connecting to GCP BQ, Creating tables, Creating datasets & transporting contents
    Sample Inputs
    ###You should have your project & credentials ready for migrating data from Data Lake to BQ
    bq_dataset     = 'bigquery-public-data:hacker_news'    ## Your BQ Dataset if created else create one
    bq_project     = 'igneous-study-316208'             ### This is Mandatory
    bq_credentials = r'C:\Users\ABC\Downloads\igneous-study-316208-d66aebfd83ea.json' ##Mandt
    
    ##Initialize BQ
    bq =  BQConnect(bq_dataset,bq_project,bq_credentials)
    
    ##Initialize BQ
    bq =  BQConnect(bq_dataset,bq_project,bq_credentials)
    
    bq_client, bq_ds = BQConnect.connect2bq(bq)​

 

  • Create Dataset
    ###Create new Dataset for your tables first.
    lv_ab = BQConnect.create_dataset(bq_client,'HANADL')
    
    Output
    Creating DataSet.....
    Created.. Thanks

 

  • Create Table 
  • ### Create table ...
    BQConnect.create_tab(bq_client, df, 'HOTEL')
    
    Ouput:
    Started Creating table.....
    igneous-study-316208.HANADL.HOTEL
    Preparing Schema...
    Ready.....
    CRITICAL:root:Dataset igneous-study-316208.HANADL.HOTEL already exists, not creating.

 

  • Finally to transport data to BQ
    ####Command for BQ Insert
    df.to_gbq('HANADL.HOTEL',project_id=bq_client.project,if_exists='append')

 

          Data Preview from Data Lake

Data%20Preview

 

        GCP BQ Output

hotel_data

 

  • So here we come to an end where we have successfully transferred data from SAP HANA Data Lake to Big Query , Probably we will see the transfer from Big Query to SAP HANA Data Lake in next post – till then take care & Keep learning.

 

 PS :Finally I am adding a small demo video of my work . thanks

 

PS: Please don’t forget to share your valuable feedback or any use case in mind for implementation or try.

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Saurabh Mishra
      Saurabh Mishra

      This is interesting and appreciate the effort you put into this and looking forward to integrate Big Query with On-Prem HANA or S4 HANA (if possible Without SLT & BODS)

      Author's profile photo Shivam Shukla
      Shivam Shukla
      Blog Post Author

      I think remote server connector is already there in SAP HANA to connect to BQ..

      Thanks,

      Shivam

       

      Author's profile photo Roland Kramer
      Roland Kramer

      Hello Shivam Shukla

      you really want to loose the control about your data and load (sensitive) content into Hyperscaler like Google.

      Discover what the SAP IQ Database can do for you - SAP (Sybase) IQ – the hidden treasure …

      Best Regards Roland

      Author's profile photo Shivam Shukla
      Shivam Shukla
      Blog Post Author

      Hi Roland Kramer

       

      Yes intention was not to expose any information or sensitive data , I was exploring BigQuery and got the idea of integrating it with data lake , I am also thinking to moving data from multiple sources to DL (Data Lake) for Analytics/Insights.

       

      Thanks,

      Shivam

      Author's profile photo Roland Kramer
      Roland Kramer

      Hello Shivam Shukla
      I'm not talking about SAP HANA Datalake, this is SAP IQ Database on-premise.
      I fact, SAP HANA Datalake is nothing else, than the SAP Implementation of SAP IQ Multiplex in Amazon.
      Especially when I comes to sensitive Data, Hyperscaler or Cloud Solutions on Hyperscaler's cannot guarantee that you still the owner or is is the Data Center in the Country.

      Replace the Azure Hyperscaler Implementation with on-premise or your own location and you are done.

      Best Regards Roland

      Author's profile photo Shivam Shukla
      Shivam Shukla
      Blog Post Author

      Hi Roland Kramer

      Yeah right got your point now - So we can leverage all the standard SAP IQ functionality like DI/DS (Analytics) without transporting the data out.

      I will explore SAP IQ and DL in detail just to understand how powerful they are in terms of Auto Machine learning as this is the only thing I see now in technology if we can process the data inside the database give the insights back to customer.

       

      thanks for pointing this out and guiding us.

       

      Thanks,

      Shivam