HANA SDA – Google BigQuery
SAP HANA continues to build data bridges, the latest bridge in the the SDA family is Google BigQuery. SAP HANA can now combine data from Google BigQuery, enabling data federation and/or data ingestion into the HANA platform.
The process to enable integration with Google BigQuery is simple. During this blog I’ll walk through the process to build this new data bridge.
1 – Ensure your HANA platform is HANA 2.0 SPS03+
2 – Download the driver from google. You can use the following link
3 – If not already done you need to download, install and configure the UnixODBC. You can use the following link
4 – Configure a few driver settings as detailed here. example of .odbc.ini file after configuration.
5 – Create a new remote source. This can be done several ways. The following SQL script will create a remote source to my GoogleBQ DSN that was defined in the .odbc.ini file.
CREATE REMOTE SOURCE BigQ ADAPTER “odbc”
CONFIGURATION FILE ‘property_bq.ini’ CONFIGURATION ‘DSN=GoogleBQ’
6 – Once completed you should now see something similar to the following.(hana-200416 is my Google BigQuery project id)
7 – In order to query the data and combine it with other enterprise data in HANA we will import it as a catalog object.
7 – We will now combine the Real-Time Call Detail HOT data from HANA with the COLD data stored in Google BigQuery. Using a calculation view we can select from both HOT and COLD in parallel and UNIOIN the results together.
8 – First Lets have a look at the aggregated results from Google BigQuery.
9 – Now a look at the results from the HANA calculation view. You can see a match with the Cold data output from the HANA calculation view and the native GBQ output.
10 – An important consideration when using SDA is ensuring you do not return all the rows from a federated query to HANA. Instead you want to push down data intensive operations to the source and return a limited number of rows to HANA. In the example we are looking at there are 1000 rows in GBQ but we only need to return one row from GBQ as an aggregated sum of the fact data. There are a couple was to validate your federated query is being pushed down.
10a – You can look at the smart data access log. In the below example you can see in green our aggregation SQL is being pushed down to GBQ and as a result one row being returned. In red you can see no aggregation and 200 rows being returned. The volume here is small, however when working with millions or billions of rows it becomes extremely important to not move such large datasets to HANA VIA a real-time SQL query. If you need to work with the large unaggregated dataset in HANA then you can consider moving it to HANA via SDI
10b – You can look at the HANA Visualize plan to see more details on the execution plan. You can see in the below trace in green the SQL statement that has been pushed to GBQ for execution. One row is returned to the Union from both the HOT and Cold sources. This can then be consumed by the BI layer.
Conclusion: This new bridge into Google BigQuery via HANA SDA opens up even more possibilities and integration opportunities to simplify your data landscapes.