HANA SDA – Google BigQuery
SAP HANA continues to build data bridges, the latest bridge in the the SDA family is Google BigQuery. SAP HANA can now combine data from Google BigQuery, enabling data federation and/or data ingestion into the HANA platform.
The process to enable integration with Google BigQuery is simple. During this blog I’ll walk through the process to build this new data bridge.
1 – Ensure your HANA platform is HANA 2.0 SPS03+
2 – Download the driver from google. You can use the following link
3 – If not already done you need to download, install and configure the UnixODBC. You can use the following link
4 – Configure a few driver settings as detailed here. example of .odbc.ini file after configuration.
5 – Create a new remote source. This can be done several ways. The following SQL script will create a remote source to my GoogleBQ DSN that was defined in the .odbc.ini file.
CREATE REMOTE SOURCE BigQ ADAPTER “odbc”
CONFIGURATION FILE ‘property_bq.ini’ CONFIGURATION ‘DSN=GoogleBQ’
6 – Once completed you should now see something similar to the following.(hana-200416 is my Google BigQuery project id)
7 – In order to query the data and combine it with other enterprise data in HANA we will import it as a catalog object.
7 – We will now combine the Real-Time Call Detail HOT data from HANA with the COLD data stored in Google BigQuery. Using a calculation view we can select from both HOT and COLD in parallel and UNIOIN the results together.
8 – First Lets have a look at the aggregated results from Google BigQuery.
9 – Now a look at the results from the HANA calculation view. You can see a match with the Cold data output from the HANA calculation view and the native GBQ output.
10 – An important consideration when using SDA is ensuring you do not return all the rows from a federated query to HANA. Instead you want to push down data intensive operations to the source and return a limited number of rows to HANA. In the example we are looking at there are 1000 rows in GBQ but we only need to return one row from GBQ as an aggregated sum of the fact data. There are a couple was to validate your federated query is being pushed down.
10a – You can look at the smart data access log. In the below example you can see in green our aggregation SQL is being pushed down to GBQ and as a result one row being returned. In red you can see no aggregation and 200 rows being returned. The volume here is small, however when working with millions or billions of rows it becomes extremely important to not move such large datasets to HANA VIA a real-time SQL query. If you need to work with the large unaggregated dataset in HANA then you can consider moving it to HANA via SDI
10b – You can look at the HANA Visualize plan to see more details on the execution plan. You can see in the below trace in green the SQL statement that has been pushed to GBQ for execution. One row is returned to the Union from both the HOT and Cold sources. This can then be consumed by the BI layer.
Conclusion: This new bridge into Google BigQuery via HANA SDA opens up even more possibilities and integration opportunities to simplify your data landscapes.
Interesting blog. I was able to install the Google BQ drivers and configure my .odbc.ini file with a DSN sample, however I am not able to test it in either HANA Studio as a Remote source or in command line with isql. Same error message in either case:
hxeadm@hxehost:/usr/sap/HXE/home> isql -v GoogleBQ
[08001][unixODBC][Simba][ODBC] (10380) Unable to establish connection with data source. Missing settings: {[RefreshToken]}
[ISQL]ERROR: Could not SQLConnect
hxeadm@hxehost:/usr/sap/HXE/home>
Am I missing where this gets configured?
already heard about - SAP (Sybase) IQ – the hidden treasure …
Best Regards Roland
I've got similar issue but after modifying the OAuthMechanism=0 (Service Authentication) I could connect via isql.
example:
[ODBC]
Trace=no
[ODBC Data Sources]
BQ=Simba ODBC Driver for Google BigQuery 64-bit
[BQ]
Description=Simba ODBC Driver for Google BigQuery (64-bit) DSN
Driver=<path_to_libgooglebigqueryodbc_sb64.so>
Catalog=<GCP_project_name>
SQLDialect=1
OAuthMechanism=0
Email=xxxxxx
KeyFilePath=<path_to_p12_file>
You can see the exact service account address in the IAM setting of GCP console and also get the p12 file.
Hello,
While creating my remote source i have the following message :
SAP DBTech JDBC: [403]: internal error: : Safe mode not enabled for BigQuery adapter. Please turn on safe mode before creating remote source BIGQ: line 1 col 22 (at pos 21)
Have you faced this already ?
Thanks
Hi Williams,
I am facing a similar error. Were you able to resolve this and how?
Thanks,
Rahul
Hey Rahul,
Yes i fixed it, you need to restart hana after you activate bigquery in safe mode, only after that you can create your remote connection.
If you have any issue please let me know or email me on williams.ruter@kochasoft.com
Best regard
Williams
Can I add a proxy somewhere to access the internet?
Or do I need a direct routing?
Hey Robert,
nice blog. Is there a way in the ODBC.ini file to support multiple catalogs (that is multiple GCP projects)?
We are able to create multiple datasources within the ODBC.ini that is making us create multiple remote sources in HANA but if we could pass multiple catalogs, it would suffice the need with just one remote source so just curious.
Catalog setting in ODBC.ini
Hey Sundeep.
Did you get update about creating multiple GCP projects in 1 single BigQuery connection? If so, please share. Thank you
Hey Ravi,
No, I did not hear back from SAP on this. At this point we are creating multiple remote sources so we have one remote source per GCP project (that is a single catalog in ODBC.ini per datasource)
Hi Sundeep,
you can create Bigquery SDA connection to multiple projects. Maintain the ODBC file with DSN values as shown below and create the remote source with the SQL
CREATE REMOTE SOURCE BigQ ADAPTER “odbc”
CONFIGURATION FILE ‘property_bq.ini’ CONFIGURATION ‘DSN=GoogleBQ1’
[GoogleBQ1]
Driver=/usr/sap/<SID>/home/simba/googlebigqueryodbc/lib/64/libgooglebigqueryodbc_sb64.so
SQLDialect=1
OAuthMechanism=0
Email=XXXXXXa@XXXXX.iam.gserviceaccount.com(service account user in that project for which Big query would like to connect)
KeyFilePath=/usr/sap/<SID>/home/XXXX.p12(key file of service account user)
Catalog=XXXXXX(Project catalog, usually it will be GCP project name)
Thanks Ravi, we already have similar configuration in our ODBC.ini file. The issue seems to be that even though you can see multiple projects because the service account has access to multiple projects, the datasets and tables/views within the project show up only for the one GCP project (catalog) that is defined within the ini file (the last line you have in the configuration file). Will take another look and update here if we get it to work.