Skip to Content
Technical Articles
Author's profile photo Sefan Linders

How to add Snowflake as a source for SAP Datasphere or SAP HANA Cloud

Introduction

At time of writing there is no dedicated connection tile for Snowflake in SAP Datasphere. An alternative that you can use today is the Generic JDBC connection, which can leverage the Snowflake JDBC driver. Since I got several questions on how to configure this and overcome certain limitations, I’d like to share my view on how to best configure this.

This setup supports both federation and batch replication through remote tables. Data Flows are not natively supported, but you can use the created remote tables as a source in a Data Flow.

This blog treats the connection from SAP Datasphere, but as the underlying framework for the connection is SAP Smart Data Integration, a similar configuration can be made on SAP HANA Cloud, although the user interface will be different.

Requirements

Besides the obvious need for a SAP Datasphere tenant and a Snowflake account, you will need the following:

  • Installed SAP Data Provisioning Agent (version 2.6.1.0 or later). The installation is not handled in this blog;
  • Admin rights on Snowflake to create a user and assign privileges.

How to set up

Prepare the CamelJDBC adapter and XML files

Most of the steps are described or linked from in these SAP Help pages, but in summary the following steps need to be followed:

  • Upload the Snowflake JDBC driver to folder <DPAgent_root>/camel/lib;
  • Check if the CamelJDBC config part in <DPAgent_root>/camel/adapters.xml is uncommented (details here);
  • Make sure CAP_LIMIT is part of the capabilities in file <DPAgent_root>/camel/adapters.xml, otherwise TOP or LIMIT operations are not pushed down;
  • Make sure the Generic JDBC adapter is activated in SAP Datasphere;
  • If you made any changes in the config files or upgraded the agent, make sure to refresh the capabilities in SAP Datasphere.
  • Edit file <DPAgent_root>/camel/sample-jdbc-dialect.xml and change the data type mapping for source type “NUMBER”. Make sure the file has the following two entries for that source type. This is necessary because in Snowflake, any integer-like value (e.g. tinyint or bigint) is defined as NUMBER (38,0) in the source dictionary. To overcome issues with values not fitting in the default mapped HANA INTEGER data type, the mapping should be changed to BIGINT.
<Mapping srcType="NUMBER" length="" precision="" scale="[1,38]" hanaType="DECIMAL" />
<Mapping srcType="NUMBER" length="" precision="" scale="[-38,0]" hanaType="BIGINT" />

Create a user in Snowflake

Create a dedicated user in Snowflake that you use to logon with from SAP Datasphere.  With the following SQL statements I created a user on Snowflake, but there is obviously a bit of freedom here on how you want to do things. Please do note that:

  • This user requires operate/usage privileges on a warehouse or you will run into errors;
  • I have chosen not to assign a default database and a default warehouse to the user, and instead I provide those parameters in the connection string in the SAP Datasphere connection. But you have to assign it either to the user directly, or add it in the connection string, or you run into errors.
  • I recommend to assign a maximum of one database to a user. Assigning more databases can become confusing, as SAP Datasphere will list all schemas of all databases in one go, without differentiating between databases, and only the default database can actually be queried. If you have more than one database to connect to, create one connection with a dedicated user for each database.
create role dwc_role;
grant operate on warehouse sample_wh_xs to role dwc_role;
grant usage on warehouse sample_wh_xs to role dwc_role;
create user dwc_generic password='<PASSWORD>' must_change_password = false;
grant role dwc_role to user dwc_generic;
alter user dwc_generic set default_role = dwc_role;

Create the connection in SAP Datasphere

You need to create a DWC connection for each Snowflake database you want to connect. The reason is that one Snowflake connection can give you access to multiple databases. However, SAP Datasphere does not use the database metadata. This can result in multiple issues:

  • Seeing multiple schemas with the same name, not knowing to which database they belong;
  • Not being able to fetch data from any other database other than the default database that has been set for the user or configured in the connection string.

In SAP Datasphere, choose to setup a new connection and choose connection tile “Generic JDBC”. The configuration should look similar as in below screenshot.

Configuration%20in%20Generic%20JDBC%20connection%20tile%20in%20SAP%20Data%20Warehouse%20Cloud

Figure 1: Configuration in Generic JDBC connection tile in SAP Datasphere

JDBC driver class: net.snowflake.client.jdbc.SnowflakeDriver

JDBC connection string: jdbc:snowflake://<your_account_and_region>.snowflakecomputing.com?db=SNOWFLAKE_SAMPLE_DATA&warehouse=sample_wh_xs

As you can see, I have set a database and warehouse which will be used as default in the JDBC session. Alternatively, you can leave this empty and assign a default user and warehouse when creating the user in Snowflake.

After completing this connection wizard, the setup is completed and you should be able to start using your Snowflake as a source for modeling.

Troubleshooting

Below, the most common issues with their solution are listed. Please note that when you run into something unexpected in SAP Datasphere regarding this connection type, and the error thrown in the user interface is not clear or not present, please check the Data Provisioning Agent log files. This can be done either from SAP Datasphere directly, or by checking the framework.trc log files on the Data Provisioning Agent instance in the <DPAgent_root>/log directory.

Data preview fetches all data from source

Upon data preview, it can happen that all data is being fetched from the source. You would notice this either because the data preview time is very high, or you have checked the SQL statement in the remote query monitor, or in the Snowflake historic statements where you do not see a TOP statement applied. In that case:

  • Check if your Data Provisioning Agent is up to date. Make sure to run the latest DPA. A feature to push down TOP and LIMIT operations was added in version 2.6.1.0;
  • (After upgrading) make sure to check the capabilities as explained in paragraph “Prepare the CamelJDBC adapter and XML files”, rebooted the Agent, and refreshed the agent capabilities in SAP Datasphere;
  • You might also have to re-save your connection in SAP Datasphere, by just opening the connection settings, re-entering your credentials and saving the connection. Sometimes this is needed additionally to refresh the adapter capabilities for this remote source.

DWC does not list databases, and shows all schemas of all databases unorganized

Below a screenshot of seeing duplicate schema names in SAP Datasphere.

Figufre%202%3A%20Snowflake%20schema%20representation%20in%20SAP%20Datawarehouse%20Cloud%20source%20hierarchy

Figure 2: Snowflake schema representation in SAP Datasphere source hierarchy

The reason for the duplicate schemas showing up, is that these schemas are present in multiple Snowflake databases. However, the database metadata is not used to present the source system hierarchy and therefore the schemas look like duplicates.

To partially overcome the representation issue, the Snowflake database user should be restricted in the number of databases authorised for. This also means that if you want to access multiple Snowflake databases, you should create separate connections and use separate Snowflake users for each connection.

However, there are always the two databases DEMO_DB and UTIL_DB assigned to any user, which both have a schema INFORMATION_SCHEMA and PUBLIC. Therefore, these will always show up as duplicates.

DWC data preview error: “Schema X does not exist or not authorized”

When trying to access a schema other than the user default schema, the following error is listed in the agent logs after trying to fetch data, even though the remote table can be created. When doing a data preview, the error occurs.

2022-05-19 12:36:44,419 [ERROR] [1ffdd62b-2fac-4883-9841-54281957b4cc52674] DefaultErrorHandler | CamelLogger.log – Failed delivery for (MessageId: ID-ip-172-31-0-19-1652960718543-0-3 on ExchangeId: ID-ip-x-x-0-3). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Schema ‘SAMPLE_DB.TPCH_SF1’ does not exist or not authorized.

The reason for this is that SAP Datasphere is not aware of, or ignores, the database parameter. To overcome this issue, create an individual connection for each Snowflake database that you want to access, with a parameter to set the default database. This can be achieved using a database connection parameter in the connection settings, as you can see in the paragraph Create the connection in SAP Datasphere.

Data preview returns 0 records, or throws an error on data preview “an error occurred while loading metadata”

It can happen (like it happened to me) that initially data fetching and data preview is working, and then at some point it just runs into an error. When this happened to me, the error in the agent logs was as follows:

2022-05-23 08:22:08,550 [ERROR] [15ad44f9-9977-4e42-b317-b045f7fa4cc474829] DefaultErrorHandler | CamelLogger.log [] – Failed delivery for (MessageId: EDAED96EEC4A7FE-0000000000000002 on ExchangeId: EDAED96EEC4A7FE-0000000000000002). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session.  Select an active warehouse with the ‘use warehouse’ command.

It turns out that I had not set an active warehouse for the user, a setting that might have changed after I stopped working actively in the Snowflake admin tool and a new session was initiated which did not have a default warehouse assigned.

The solution is to assign a default warehouse to the user. This can be done either by assigning a default warehouse to the user using Snowflake user management, or by defining the default warehouse in the connection parameters, as you can see in the example in the connection configuration paragraph. When you make changes like these, you might have to disconnect the session from the Snowflake admin tool to force a new session with the new settings.

Data preview error: NumberFormatException

In most cases, this is because you are loading values larger than Integer into a HANA Integer field. Check paragraph Prepare the CamelJDBC adapter and XML files on how to change the mapping for source data type NUMBER(38,0).

You might also have another data type mapping issue and have to adjust another source to target data type mapping. Check the agents framework.trc logs for more clues. Usually the erroneous value is listed, from which you can deduct which source or target data type is the culprit.

Conclusion

Hopefully this blog helped you setting up or troubleshooting your connection from SAP Datasphere to Snowflake, using the Generic JDBC connection tile. If you have any experiences to add, just leave them in the comments.

 

Assigned Tags

      13 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Mark Eifert
      Mark Eifert

      You didnt mention the capabilities.xml file in “Prepare the CamelJDBC adapter and XML files” (or at least I cant find it, how to set Cap_*?

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      Capabilities are defined in <DPAgent_root>/camel/adapters.xml. I see I made an error in the troubleshooting section where I pointed to the wrong xml file. That's corrected now.

      Author's profile photo Rajesh Gollapudi
      Rajesh Gollapudi

      Can we add CAP_UNION  as well : I have added it to adapter capabilities and reflected in HANA level as well but the query behavior is not changed, union is not getting push down to snowflake in this case.

       

      Any leads?

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      If you run below query, you see that in the default adapter capabilities for CamelJDBC, UNION is not supported. Changing this in the adapters file usually does not change this and can even lead to errors.

      select * from adapter_capabilities
      where adapter_name = 'CamelJdbcAdapter' and UPPER(description) like '%UNION%'
      Author's profile photo Marco Roth
      Marco Roth

      Hi Stefan,

      In the Part "How to setup" you wrote, the Capability CAP_LIMIT should be set in the adapters.xml file. But actually in the "Tag Area" CamelJdbcAdapter this Capability is not listed.

      What's your recommendation. Do I've to add this capability manually or is this not an option, because the Capability isn't part within the current release (we use 2.6.3.4)

      Many thanks for your feedback

      Best regards,

      Marco

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      Not sure what you mean by "Tag Area", but I assume you are looking at the capabilities tag. You are on the right release, so you can just open the adapter.xml file, and add a line with the CAP_LIMIT capability. This would look like follows, and if I look at version 2.6.4.3 on an unedited file that capability would list at line 311 although it doesn't matter where in that list you add it:

      [...]
      <Capabilities>
      CAP_LIMIT,
      CAP_AND_DIFFERENT_COLUMNS,
      CAP_TRUNCATE_TABLE,
      [...]
      After saving the file, refresh the capabilities in SAP Datasphere. You can check in SAP Datasphere if the capability has successfully been updated in the database by running the following query using a DBA user:
      select * from adapter_capabilities
      where adapter_name = 'CamelJdbcAdapter' and CAPABILITY_NAME = 'CAP_LIMIT'​
      Author's profile photo Joel Blackthorne
      Joel Blackthorne

      Hi Stefan,

      Thanks for the blog post.  I was able to follow along and use DP Agent to deploy the Snowflake connections.  The problem I am now facing is that the connector:

      • Data flows are not supported
      • Replication flows are not supported
      • Remote Tables are enabled (making this the only option I see available)

      The result of the above is that the only method to utilize the tables I can see is to replicate table snapshots.  Is there something I might be missing?  Is there any other option other than replication configurable?

      Thank you!

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      Data flows and replication flows currently indeed don’t support generic JDBC. But besides table snapshots and federation, you can also leverage data flows by reading from a virtual table that you create before creating the data flow. The virtual table is then just a source table for the data flow.

      Author's profile photo Joel Blackthorne
      Joel Blackthorne

      Hi Sefan,

      Thank you so much for the feedback and suggestions!  I am very new to DS and working on our first POC, meaning I am still trying to piece together a proper architecture design.  I was hoping that SDI would bring us functionality for Real-Time Replication or replication via delta.  Instead, Snowflake running over Generic JDBC does not provide the capabilities available to other Connection types.

      If I understand correctly, the only options for Snowflake and JDBC-based drivers are:

      • Snapshot replication of the whole Remote Table
      • Use Data flows to import the Remote table data to a DS Table
        • Use Batches in a data flow to limit memory usage and transfer sizes
        • Then, use views to logically partition the data such that dataflows can replicate just the changed data.  Though, this design is completely manual.

      In my case, the Snowflake data is massive and cannot be replicated as whole table snapshots.  Do you have any suggestions for delta replication other than to use a Data Flow and a view for manual partitioning?

      Thank you!

      Joel

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      You could also look into the view partitioning which allows you to lock partitions, see this help page. The locked partitions won't be re-loaded upon snapshotting the view. You would then obviously need a partition strategy where you know that certain partitions remain unchanged. Do you have one or more columns in the source data that tell you if data has changed, such as a creation or changed date?

      Author's profile photo Steven Ruggiero
      Steven Ruggiero

      I am getting something similar. Sefan, do you have any insight to this?

       

      Also, my goal is to use Datasphere to "Virtualize" Snowflake data. I assume I would just use a SQL View?

      Error

      Error

      Author's profile photo Rajesh Gollapudi
      Rajesh Gollapudi

      Hello,

       

      I'm getting below error in hana when we try to read data from snowflake:

      I have added the solution already but still failed:

       

      Data in the source is like :

      Source data type is number (21,7) and target datatype is decimal (21,7).

      any idea how to fi this?

      Thanks for the support in adv.

      Br//Rajesh G

      Author's profile photo Sefan Linders
      Sefan Linders
      Blog Post Author

      Can you try to replace the entries provided in the blog with the following?

      <Mapping srcType="NUMBER" length="" precision="[0,38]" scale="[1,38]" hanaType="DECIMAL" /> 		
      <Mapping srcType="NUMBER" length="" precision="[0,38]" scale="[0,0]" hanaType="BIGINT" /> 	
      <Mapping srcType="DECIMAL" length="" precision="[0,38]" scale="[1,38]" hanaType="DECIMAL" /> 	
      <Mapping srcType="DECIMAL" length="" precision="[0,38]" scale="[0,0]" hanaType="BIGINT" /> 			   
      <Mapping srcType="INT" length="" precision="" scale="" hanaType="INTEGER" /> 		
      <Mapping srcType="INTEGER" length="" precision="" scale="" hanaType="INTEGER" /> 		
      <Mapping srcType="BIGINT" length="" precision="" scale="" hanaType="BIGINT" />