Proxy Third-Party Python Library Traffic – General
Python is a powerful programming language, thanks in part to the extensive collection of third-party libraries available to developers. These libraries provide developers with an array of tools and functionalities to build applications with ease. Moreover, Python makes it effortless to connect to other software systems, thanks to its numerous client libraries. However, despite the ease of connectivity to internet-reachable systems, connecting to on-premises systems may require proxying traffic, which can pose some challenges. In this blog post, we will explore how Python can help you proxy any traffic locally, using the example of connecting a PostgreSQL database. Furthermore, we will also discuss alternative approaches to local proxying.
Connect a PostgreSQL on-premises database via SocketSwap
Szenario: Let us assume that we have a PostgreSQL on-premises database that we want to connect to using the psycopg2 client library. However, this library does not support custom proxies or Python sockets since it is written in C. We will use a local proxy to bridge the traffic through a proxy-enabled socket.
The following architecture diagram provides an overview of the scenario. We have some Python code that is executing in an arbitrary environment, and we need to connect it to an on-premises PostgreSQL database. Usually, on-premises environments do not allow inbound traffic, or at least they impose strict limitations on it. For the example scenario, we will use the psycopg2 client library, which is the most popular PostgreSQL client on GitHub in terms of stars.
Note: This technique is a general approach to redirect traffic through a custom socket. This is not postgres specific, but can be applied to any library that uses TCP based networking.
Before looking at the SocketSwap local proxy approach, here are some alternatives to consider:
There are networking patterns to connect to a on-premises system that allow for a common network zone. VPN is an example for that. With this technique you are not bound to sending traffic via a proxy server.
Pure Python Traffic
If you are fortunate enough to have a client library that is entirely Python-based, you can easily redirect traffic using the following code snippet. By setting the sockets class socket object to a function that creates a proxy-enabled socket, you can achieve this. The PySocks library provides excellent support for any SOCKS proxy in this case. Additionally, you can use any custom class instance of the socket module, such as f.e. the sapcloudconnectorsocket for connecting through the SAP Cloud Connector.
import socket import socks socks.set_default_proxy(socks.SOCKS5, "proxyhost") socket.socket = socks.socksocket # set default socket generation to a proxy enabled socks object
Background: Pythons ecosystem of libraries is presumably the most powerful ecosystem in the area of data-integration out of all programming languages. You can connect to nearly any database directly from python using its dedicated client library. A power of python is, that it is a high-level language that can be extended using low-level languages like C, C++ or others. Often times the Python code is just a port on top of these lower-level components. This means in practice many libraries use other programming languages in the background and make this approach infeasible.
If your target system is SSH-enabled, you can use SSH-tunneling to redirect all traffic from a local endpoint via this tunnel directly to the target. See a little snippet here. This is a very valid technique, but you have to keep in mind, that not all servers allow for SSH-connections due to security considerations.
ssh -L 5432:localhost:5432 user@remotehost
In the code snippet, you can see how to create a SSH-tunnel using the command line. In this case, traffic from localhost:5432 is redirected through the tunnel to remotehost5432. If you configure this to pass through a proxy, you are fine as well. In Python, you can accomplish this by using Paramiko.
Operating System Solutions
The operating system plays a crucial role in the orchestration of traffic. There are a bunch of techniques and software solutions one can use at that level to redirect traffic. This is outside the scope of this blog post. In some Python environments, we do not have control over the operating system. Or are able to install other software. Thats why we need to look at the application layer possibilities.
Using the SocketSwap Library
SocketSwap is a Python package that enables you to proxy the traffic of any third-party library through a local TCP proxy server. The concept is as follows:
Suppose you have a Python client, such as psycopg2, which is written in C and is not proxy-enabled. You want to establish a connection to a PostgreSQL database in an on-premises environment, and you must pass its TCP traffic through a SOCKS5 proxy.
To accomplish this, you can load up a SocketSwapContext, which automatically handles the setup of a local proxy server that redirects traffic via a custom socket. In this example, the server listens on the local socket registered at localhost:8765 (localhost). You give the proxy-enabled socket to SocketSwap and specify where to forward the traffic. In this case, you can test it locally and use the loopback address as the target host.
""" Demo Szenario - Connecting a postgres client via SocketSwap """ import socket import psycopg2 from socketswap import SocketSwapContext def conn_factory(): target_host = "localhost" target_port = 5432 remote_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) remote_socket.connect((target_host, target_port)) return remote_socket def connect_postgres(): """ This function demos how to easily setup the local proxy using the SocketSwapContext-Manager. It exposes a local proxy on the localhost 127.0.0.1 on port 2222 The connection factory is provided to handle the creation of a socket to the remote target """ with SocketSwapContext(conn_factory, "127.0.0.1", 2222): # Set up a connection to the PostgreSQL database conn = psycopg2.connect( host="127.0.0.1", database="postgres", user="postgres", password="password", port=2222 ) # ... postgres stuff
Instead of connecting directly to remotehost:5432 with psycopg2, we now connect to 127.0.0.1:2222, which is the registered local proxy. This allows us to seamlessly use the library as if nothing has changed. One of the benefits of using SocketSwap is that it automatically handles the shut-down of the proxy when you leave the context since it works as a context manager.
SocketSwap comprises two parts: the TCP proxy, which can also be started separately, and the context manager. The context manager starts the proxy for you and runs it in a background process. The only thing you need to ensure is that you provide a function to create the socket. The source code is freely available on GitHub. And the package is installed using pip.
pip install SocketSwap
Please note that the socket_factory needs to be pickleable, which means it should be defined at the top level of a Python program. Nested functions are not supported.
I hope you enjoy using this package, and please leave a comment if you have any further scenarios or require additional details.