Proxy Third-Party Python Library Traffic – Using SAP Cloud Connector in SAP Data Intelligence Python Operators
Custom Python operators are the most convenient way to develop non-standard integrations in SAP Data Intelligence. The vast range of Python libraries available is virtually unlimited. In this blog post, I will demonstrate how to proxy TCP traffic of third-party libraries through the SAP Cloud Connector, enabling the integration of on-premises systems to be as effortless as internet-accessible ones.
Please note that this blog post is a continuation of my previous post, “Proxy Third-Party Python Library Traffic – General“, where I provided a more comprehensive explanation of the approach and libraries used.
Szenario: In this example, I will show how to connect to a PostgreSQL database via the Cloud Connector. The database is running on my Windows machine, and the Cloud Connector is connected to that machine. This replicates a real world on-premises setup.
Before proceeding with the steps outlined in this guide, it is essential to have an instance of the SAP Cloud Connector installed. While it is possible to install the cloud connector on a server, for the purposes of this demonstration, we will be using a Windows machine. We recommend following the instructions provided in this blog (https://blogs.sap.com/2021/09/05/installation-and-configuration-of-sap-cloud-connector/) to install and configure the cloud connector.
The second requirement is a BTP subaccount with a Data Intelligence cluster. To this subaccount, we will connect the Cloud Connector.
The first step is to create a configuration in the Cloud Connector that connects to our subaccount and exposes the PostgreSQL host as a TCP resource.
On the BTP end, we can check the cockpit and the connected cloud connectors in the respective menu tabs. If you cannot see this tab, you may be missing some roles. It is important to note that we see the LocationID “FELIXLAPTOP”, which is an identifier that distinguishes multiple cloud connectors connected to the same subaccount.
2. Creating a Data Intelligence Connection:
For our purposes, we do not want to hard-code the connection details, because we need a little help from the connection management to access the Connectivity Service of BTP. In the Connection Management application from SAP Data Intelligence we can create connections of all types. We create a connection of type HTTP with host, port and SAP Cloud Connector as the gateway.
In there we specify the internal host of the Cloud Connector resource, its port and the Postgres username and password.
Note: Not all connection types allow you to access via the Cloud Connector. See the official product documentation for details.
3. Developing a Custom Operator:
In the operators menu of Data Intelligence we create a new custom operator based on the Python3 operator.
We build a custom Dockerfile with the required libraries. We use psycopg2 to connect to Postgres, SocketSwap as a local proxy, and sapcloudconnectorpythonsocket to open a socket via the Cloud Connector.
FROM $com.sap.sles.base RUN python3 -m pip --no-cache-dir install 'psycopg2-binary' --user RUN python3 -m pip --no-cache-dir install 'sapcloudconnectorpythonsocket' --user RUN python3 -m pip --no-cache-dir install 'SocketSwap' --user
Now to the key part, the custom script:
import psycopg2 from SocketSwap import SocketSwapContext import multiprocessing from operators.<your_operator>.socket_factory import socket_factory multiprocessing.set_start_method("spawn") # needed because DI Python Operator is itself executed in Threads postgres_connection = api.config.http_connection api.logger.info(str(postgres_connection)) def connect_postgres(): """ This function demos how to easily setup the local proxy using the SocketSwapContext-Manager. It exposes a local proxy on the localhost 127.0.0.1 on port 2222 The connection factory is provided to handle the creation of a socket to the remote target """ with SocketSwapContext(socket_factory, [postgres_connection], "127.0.0.1", 2222): api.logger.info("Proxy started!") # Set up a connection to the PostgreSQL database conn = psycopg2.connect( host="127.0.0.1", database="postgres", user="postgres", password="password", port=2222 ) api.logger.info("Postgres connected successfully!") # Create a cursor object to execute SQL queries cur = conn.cursor() # Execute a SELECT query to retrieve data from a table cur.execute("SELECT CURRENT_TIMESTAMP;") # Fetch all the rows returned by the query rows = cur.fetchall() # Print the rows to the console for row in rows: api.logger.info(str(row)) # Close the cursor and connection cur.close() conn.close() def gen(): api.logger.info("Generator Start") connect_postgres() api.add_generator(gen)
And we need a second file in which we put the function we are passing into the SocketSwap Proxy. Note: This function is placed in a seperate file on the top level, to make it pickleable.
from sapcloudconnectorpythonsocket import CloudConnectorSocket def socket_factory(postgres_connection): cc_socket = CloudConnectorSocket() cc_socket.connect( dest_host=postgres_connection["connectionProperties"]["host"], # virtualhost dest_port=postgres_connection["connectionProperties"]["port"], # 5432 proxy_host=postgres_connection["gateway"]["host"], # connectivity-service-proxy proxy_port=20004, # 20004 SOCKS5 Proxy Port of connectivity service token=postgres_connection["gateway"]["authentication"], # auth token location_id="FELIXLAPTOP" ) return cc_socket
In the custom operator, script we just use a generator to start the connect_postgres function. Inside that function, we use the standard psycopg functions to make a small SQL query to the database. The only difference here: We wrap it in the SocketSwapContext to redirect its traffic through a CloudConnectorSocket.
Note: The SocketSwapContext takes mainly 3 arguments: A socket_factory, a callable that returns a connected socket, socket_factory_args, the local proxy host and its port. As we can see in the example, I pass the socket_factory function to the context manager. This function uses the already introduced CloudConnectorSocket functionality to open a Python socket using the Cloud Connector as proxy server. Check out my previous post about TCP and the SAP Cloud Connector to learn more details about this.
The local proxy is setup automatically in a background daemon process and listens to the given host, port. Usually this will be the localhost and a free port. In your third-party library you can then connect to that proxy and its traffic will be redirected automatically to the address specified as destination in the cloud connector socket.
4. Test the custom operator:
We can wrap the new operator in a Data Intelligence Graph and fill the configuration parameter with the POSTGRES_ONPREM_SYSTEM connection ID.
In this example, we can check the logs and see the following:
Generator Start Proxy started! Postgres connected successfully! (datetime.datetime(2023, 4, 24, 10, 10, 52, 593842, tzinfo=datetime.timezone.utc),)
The proxy registered successfully and was able to redirect the traffic coming to localhost:2222 to the on-premises PostgreSQL. As a result of the query, we get the current_timestamp in the python datetime object.
I hope this blog gave you a perspective of the possibilities you have even through the Cloud Connector to interact with any system. If you have any questions of scenarios for the SocketSwap package, feel free to leave a comment.