In this blog, we will walk through the steps to connect Google BigQuery to SAP HANA Cloud as a remote source.
Before that in case you need to know what is SAP HANA Cloud, get access to free trial and try out some tutorials you can refer to the following blogs by Denys van Kempen
SAP HANA Cloud follows Multi Cloud Strategy and as part of that strategy, one of the features is to be able to connect data sources from hypervisors. SAP HANA Cloud provides this ability via Smart Data Access and Smart Data Integration.
To connect to Google BigQuery, we will use the bigqueryrest SDA adapter.
You can also refer to the complete list of remote sources supported by SAP HANA Cloud in the following note https://launchpad.support.sap.com/#/notes/2600176
SAP HANA Cloud:
- SAP HANA Cloud Trial instance is created and running
- User on the SAP HANA Cloud with following authorizations:
- System Privileges: CREATE REMOTE SOURCE, CERTIFICATE ADMIN, TRUST ADMIN
- Object Privileges: EXECUTE on Procedures SYSTEM.SET_PSE_PURPOSE and SYSTEM.UNSET_PSE_PURPOSE
You can use the below SQL code to create an user with the required authorizations.
CREATE USER RSADMIN PASSWORD <enter_password_here> NO FORCE_FIRST_PASSWORD_CHANGE SET USERGROUP DEFAULT; GRANT CREATE REMOTE SOURCE TO RSADMIN; GRANT CERTIFICATE ADMIN TO RSADMIN; GRANT TRUST ADMIN TO RSADMIN; GRANT EXECUTE ON SYSTEM.SET_PSE_PURPOSE TO RSADMIN; GRANT EXECUTE ON SYSTEM.UNSET_PSE_PURPOSE TO RSADMIN;
- GCP Project
- Service Account in GCP with roles BigQuery Data Viewer and BigQuery User
- Key is generated for the Service Account
- BigQuery DataSet and BigQuery Table
1. Launch SAP HANA Database Explorer from the SAP HANA Cloud Instance
2. SAP HANA Database Explorer is launched with the user DBADMIN. We can continue with the next steps with the same user which has all the needed authorizations. Or if you have created the user RSADMIN as described in the pre-requisites we can add database with RSDAMIN as the user by using the option Add Database with Different User which appears when we right click on existing database connection.
3. Right-click on Remote Sources under Catalog from the selected database with user and select Add Remote Source to create a remote source which connects to Google BigQuery.
4. In the Add Remote Source screen,
- provide Source Name, for e.g., RS_BIGQUERY
- select Adapter Name as BIGQUERY (REST API)
- select Credentials Mode as Technical User
- provide Account Name, this should be the service account created in GCP which ends with iam.gserviceaccount.com
- provide Private Key, this key is available in the key file downloaded as JSON from GCP for the service account. (see point below)
Open the JSON file and locate the field “private_key”. The complete value within double quotes starting with —–BEGIN PRIVATE and ending with END PRIVATE KEY—–\n shoud be used as value for Private Key.
and click on Create. Now we can see that a remote source is created.
5. Now click on the Remote Source name RS_BIGQUERY to test it. We would notice that this remote source is not fetching any metadata from Google BigQuery. This is because REST API-based adapters communicate with the endpoint through an HTTP client. To ensure secure communication, the adapters require client certificates from the endpoint services.
6. To download the certificate, open the link https://www.googleapis.com/oauth2/v1/certs in your browser for example google chrome. Click on the small lock icon left to the url and then select certificate.
7. In the certificate screen, go to tab Details and then select Copy to File, this will launch Certificate Export Wizard.
8. In the Certificate Export Wizard, click Next and then select the download format as Base-64 encoded X.509 (.CER), then click Next and select the path where you want to download the certificate and provide a name for the certificate and then click Finish. Now your certificate is downloaded.
9. In this step, we will upload this certificate to SAP HANA Cloud and create a collection for this certificate and apply the purpose. Launch SAP HANA Cockpit from the SAP HANA Instance.
10. In SAP HANA Cockpit, select the display option Security and User Management and then under this display select the option Certificate Store.
11. In the Certificate Store, select Import and browse the certificate which you downloaded previously and then click OK.
12. Once the certificate is imported, we need to create a certificate collection. Click on Go to Certificate Collections.
13. In the Certificate Collections, click on Add Collection and provide a name for Certificate Collection and click OK.
13. Now we need to add the certificate which we imported before to this certificate collection by using the option Add Certificate then select our certificate and click OK.
14. Finally we add the purpose for the certificate collection by using the option Edit Purpose. Select Purpose as Remote Source and select the Remote Source RS_BIGQUERY.
15. We go back to the SAP HANA Database Explorer, select Remote sources and click on Remote source RS_BIGQUERY. Now we can see that the Database and Schema is populated.
- Database refers to GCP Project Name
- Schema refers to GCP BigQuery Dataset Name
Select your Database and Schema and click on Search and now we can see the BigQuery tables from the relevant selection.
16. To create a virtual table, select the check box for the table and click on Create Virtual Object. In the pop up, select the schema where you want to create the virtual table and provide a name for virtual table and click Create.
17. Locate the virtual table in the schema you selected and right click on it and select Open Data to view the data from the Google BigQuery Table.
18. Google BigQuery REST adapter supports both virtual table and linked database. Since it supports linked database we can query data from Google BigQuery without creating a virtual table. We can use the following query to query the data without virtual table.
Conclusion: We are able to connect to Google BigQuery from SAP HANA Cloud and query the data without physically loading the data. Also an important point to note is there could be some data types which are not supported which might cause issues to retrieve the data.