Technical Articles
Python Meeting Data Lake Part 1 – Connect and Query
Hi All ,
I am sharing my new knowledge on Data Lake – exploring all the way how we can interact with data lake directly.
Pre-requisites: https://developers.sap.com/tutorials/hana-cloud-dl-clients-overview.html Create your own data lake instance under managed or stand alone both are fine.
Here is my DL Instance up and running.
Install SAP IQ Client – Make sure you pick the latest version from SAP Software downloads.
Also you can follow the above tutorials for understanding data lake & SAP HANA Cloud instance. Data lake is one of the best way to store different kind of data from different source at one place and importantly at a very low-cost.
there are already few posts which can let you start easily on this topic , I am sharing my own learning So here we go .
- Go to BTP Cockpit & create your free trial account. https://account.hana.ondemand.com/
- Create your development space and under the SAP HANA Cloud create data lake instance , Choose IP whitelisting according to the requirement.
- SAP IQ Installation – Download SAP IQ drivers from SAP Software downloads.
- Open ODBC in administrator mode (I am explaining this in my windows system) – if you see SAP IQ in below screen it means drivers installation is fine in your system
- Now go to your HANA Data Lake instance and right click on the top right and copy SQL Endpoints & fill it in below driver details.
- Test Connection.
- If this is done – now you are good to go for ODBC Connection to your data lake – Programming environment is your choice now.
- I am using Jupyter notebook and Python (PYODBC) for interacting with data lake & have created few tables and inserted some data as well which bring back in my python client.
- https://pypi.org/project/pyodbc/
Install PYODBC
- Open Jupyter notebook and try to connect with Data Lake Instance .
Import Packages and provide connection details.
import pyodbc
cnxn = pyodbc.connect('DSN=HDLSA;UID=HDLADMIN;PWD=abc1234@123A')
Open cursor and execute some select statements.
cur = cnxn.cursor()
cur.execute('SELECT * FROM HOTEL.HOTEL')
Hotel Table under HOTEL Schema is already created , follow data lake tutorials.
Fetch Data
rows = cur.fetchall()
rows
Print Records.
So here comes to and end of connecting data lake from Python , In next part , we will be uploading the data from CSV to Data Lake.
Keep learning Keep querying 🙂
Thanks.