How to add Snowflake as a source for SAP Data Warehouse Cloud or SAP HANA Cloud
At time of writing there is no dedicated connection tile for Snowflake in SAP Data Warehouse Cloud. An alternative that you can use today is the Generic JDBC connection, which can leverage the Snowflake JDBC driver. Since I got several questions on how to configure this and overcome certain limitations, I’d like to share my view on how to best configure this.
This setup supports both federation and batch replication through remote tables. Data Flows are not natively supported, but you can use the created remote tables as a source in a Data Flow.
This blog treats the connection from SAP Data Warehouse Cloud, but as the underlying framework for the connection is SAP Smart Data Integration, a similar configuration can be made on SAP HANA Cloud, although the user interface will be different.
Besides the obvious need for a SAP Data Warehouse Cloud tenant and a Snowflake account, you will need the following:
- Installed SAP Data Provisioning Agent (version 126.96.36.199 or later). The installation is not handled in this blog;
- Admin rights on Snowflake to create a user and assign privileges.
How to set up
Prepare the CamelJDBC adapter and XML files
Most of the steps are described or linked from in these SAP Help pages, but in summary the following steps need to be followed:
- Upload the Snowflake JDBC driver to folder <DPAgent_root>/camel/lib;
- Check if the CamelJDBC config part in <DPAgent_root>/camel/adapters.xml is uncommented (details here);
- Make sure CAP_LIMIT is part of the capabilities in file <DPAgent_root>/camel/adapters.xml, otherwise TOP or LIMIT operations are not pushed down;
- Make sure the Generic JDBC adapter is activated in SAP Data Warehouse Cloud;
- If you made any changes in the config files or upgraded the agent, make sure to refresh the capabilities in SAP Data Warehouse Cloud.
- Edit file <DPAgent_root>/camel/sample-jdbc-dialect.xml and change the data type mapping for source type “NUMBER”. Make sure the file has the following two entries for that source type. This is necessary because in Snowflake, any integer-like value (e.g. tinyint or bigint) is defined as NUMBER (38,0) in the source dictionary. To overcome issues with values not fitting in the default mapped HANA INTEGER data type, the mapping should be changed to BIGINT.
<Mapping srcType="NUMBER" length="" precision="" scale="[1,38]" hanaType="DECIMAL" /> <Mapping srcType="NUMBER" length="" precision="" scale="[-38,0]" hanaType="BIGINT" />
Create a user in Snowflake
Create a dedicated user in Snowflake that you use to logon with from SAP Data Warehouse Cloud. With the following SQL statements I created a user on Snowflake, but there is obviously a bit of freedom here on how you want to do things. Please do note that:
- This user requires operate/usage privileges on a warehouse or you will run into errors;
- I have chosen not to assign a default database and a default warehouse to the user, and instead I provide those parameters in the connection string in the SAP Data Warehouse Cloud connection. But you have to assign it either to the user directly, or add it in the connection string, or you run into errors.
- I recommend to assign a maximum of one database to a user. Assigning more databases can become confusing, as SAP Data Warehouse Cloud will list all schemas of all databases in one go, without differentiating between databases, and only the default database can actually be queried. If you have more than one database to connect to, create one connection with a dedicated user for each database.
create role dwc_role; grant operate on warehouse sample_wh_xs to role dwc_role; grant usage on warehouse sample_wh_xs to role dwc_role; create user dwc_generic password='<PASSWORD>' must_change_password = false; grant role dwc_role to user dwc_generic; alter user dwc_generic set default_role = dwc_role;
Create the connection in SAP Data Warehouse Cloud
You need to create a DWC connection for each Snowflake database you want to connect. The reason is that one Snowflake connection can give you access to multiple databases. However, SAP Data Warehouse Cloud does not use the database metadata. This can result in multiple issues:
- Seeing multiple schemas with the same name, not knowing to which database they belong;
- Not being able to fetch data from any other database other than the default database that has been set for the user or configured in the connection string.
In SAP Data Warehouse Cloud, choose to setup a new connection and choose connection tile “Generic JDBC”. The configuration should look similar as in below screenshot.
JDBC driver class: net.snowflake.client.jdbc.SnowflakeDriver
JDBC connection string: jdbc:snowflake://<your_account_and_region>.snowflakecomputing.com?db=SNOWFLAKE_SAMPLE_DATA&warehouse=sample_wh_xs
As you can see, I have set a database and warehouse which will be used as default in the JDBC session. Alternatively, you can leave this empty and assign a default user and warehouse when creating the user in Snowflake.
After completing this connection wizard, the setup is completed and you should be able to start using your Snowflake as a source for modeling.
Below, the most common issues with their solution are listed. Please note that when you run into something unexpected in SAP Data Warehouse Cloud regarding this connection type, and the error thrown in the user interface is not clear or not present, please check the Data Provisioning Agent log files. This can be done either from SAP Data Warehouse Cloud directly, or by checking the framework.trc log files on the Data Provisioning Agent instance in the <DPAgent_root>/log directory.
Data preview fetches all data from source
Upon data preview, it can happen that all data is being fetched from the source. You would notice this either because the data preview time is very high, or you have checked the SQL statement in the remote query monitor, or in the Snowflake historic statements where you do not see a TOP statement applied. In that case:
- Check if your Data Provisioning Agent is up to date. Make sure to run the latest DPA. A feature to push down TOP and LIMIT operations was added in version 188.8.131.52;
- (After upgrading) make sure to check the capabilities.xml file as explained in paragraph “Prepare the CamelJDBC adapter and XML files”, rebooted the Agent, and refreshed the agent capabilities in SAP Data Warehouse Cloud;
- You might also have to re-save your connection in SAP Data Warehouse Cloud, by just opening the connection settings, re-entering your credentials and saving the connection. Sometimes this is needed additionally to refresh the adapter capabilities for this remote source.
DWC does not list databases, and shows all schemas of all databases unorganized
Below a screenshot of seeing duplicate schema names in SAP Data Warehouse Cloud.
The reason for the duplicate schemas showing up, is that these schemas are present in multiple Snowflake databases. However, the database metadata is not used to present the source system hierarchy and therefore the schemas look like duplicates.
To partially overcome the representation issue, the Snowflake database user should be restricted in the number of databases authorised for. This also means that if you want to access multiple Snowflake databases, you should create separate connections and use separate Snowflake users for each connection.
However, there are always the two databases DEMO_DB and UTIL_DB assigned to any user, which both have a schema INFORMATION_SCHEMA and PUBLIC. Therefore, these will always show up as duplicates.
DWC data preview error: “Schema X does not exist or not authorized”
When trying to access a schema other than the user default schema, the following error is listed in the agent logs after trying to fetch data, even though the remote table can be created. When doing a data preview, the error occurs.
2022-05-19 12:36:44,419 [ERROR] [1ffdd62b-2fac-4883-9841-54281957b4cc52674] DefaultErrorHandler | CamelLogger.log – Failed delivery for (MessageId: ID-ip-172-31-0-19-1652960718543-0-3 on ExchangeId: ID-ip-x-x-0-3). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Schema ‘SAMPLE_DB.TPCH_SF1’ does not exist or not authorized.
The reason for this is that SAP Data Warehouse Cloud is not aware of, or ignores, the database parameter. To overcome this issue, create an individual connection for each Snowflake database that you want to access, with a parameter to set the default database. This can be achieved using a database connection parameter in the connection settings, as you can see in the paragraph Create the connection in SAP Data Warehouse Cloud.
Data preview returns 0 records, or throws an error on data preview “an error occurred while loading metadata”
It can happen (like it happened to me) that initially data fetching and data preview is working, and then at some point it just runs into an error. When this happened to me, the error in the agent logs was as follows:
2022-05-23 08:22:08,550 [ERROR] [15ad44f9-9977-4e42-b317-b045f7fa4cc474829] DefaultErrorHandler | CamelLogger.log  – Failed delivery for (MessageId: EDAED96EEC4A7FE-0000000000000002 on ExchangeId: EDAED96EEC4A7FE-0000000000000002). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session. Select an active warehouse with the ‘use warehouse’ command.
It turns out that I had not set an active warehouse for the user, a setting that might have changed after I stopped working actively in the Snowflake admin tool and a new session was initiated which did not have a default warehouse assigned.
The solution is to assign a default warehouse to the user. This can be done either by assigning a default warehouse to the user using Snowflake user management, or by defining the default warehouse in the connection parameters, as you can see in the example in the connection configuration paragraph. When you make changes like these, you might have to disconnect the session from the Snowflake admin tool to force a new session with the new settings.
Data preview error: NumberFormatException
In most cases, this is because you are loading values larger than Integer into a HANA Integer field. Check paragraph Prepare the CamelJDBC adapter and XML files on how to change the mapping for source data type NUMBER(38,0).
You might also have another data type mapping issue and have to adjust another source to target data type mapping. Check the agents framework.trc logs for more clues. Usually the erroneous value is listed, from which you can deduct which source or target data type is the culprit.
Hopefully this blog helped you setting up or troubleshooting your connection from SAP Data Warehouse Cloud to Snowflake, using the Generic JDBC connection tile. If you have any experiences to add, just leave them in the comments.