Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
sourabh_sharma6
Explorer

Step by Step process for developing Data Science python scripts by using SAP HANA Database on Cloud Platform.


Overview


SAP Cloud Platform is an open platform-as-a-service (PaaS) that delivers in-memory capabilities, core platform services, and unique micro services for building and extending intelligent, mobile-enabled cloud applications.

Data Science is the process of deriving knowledge and insights from a huge and diverse set of data through organizing, processing and analyzing the data.

Python is a dynamic, interpreted (byte-code-compiled) language. There are no type declarations of variables, parameters, functions, or methods in source code. This makes the code short and flexible, and you lose the compile-time type checking of the source code.

DISCLAIMER:Please note that the resources and the data used is for demonstration purpose only.

We will be developing a simple python script illustrating data graphically using data science packages like panda, matplotlib and pyhdb by opening data base tunnel to SAP HANA Cloud Platform.

  • PYHDB is a pure Python client package for the SAP HANA Database based on the SAP HANA Database SQL Command Network Protocol.

  • MATPLOTLIB is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

  • PANDAS is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.


Prerequisites:

Lets start the development now:

  • Open data base tunnel to SAP HANA Cloud Database

    • Open command prompt and enter command to change the current directory to refer to the neo.sh file for the downloaded SDK. Replace username with your workstation name.
      cd C:\Users\username\Desktop\PY\SDK\tools​


    • Now enter below connection string to open a database tunnel to cloud.Replace username,databasename and password with your HANA trial account username,databasename and password.
      neo open-db-tunnel -h hanatrial.ondemand.com -a usernametrial -u username -i databasename -p password​


    • Congratulations you have successfully opened a database tunnel.




Lets upload sample data to HANA cloud using SAP HANA studio:

 

Its time for python development

  • Open Python IDE and create a new file

  • Below is the code for connecting to the database and performing data analysis operations on the fetched data:Replace username and password with the database username and password for your MDC database instance.
    import pyhdb
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib
    connection = pyhdb.connect('localhost', 30015, 'username', 'password')
    cursor = connection.cursor()
    cursor.execute("SELECT top 20 DATE, HIGH FROM SAP_HANA_DEMO.NIFTY_50_DATA")
    a = cursor.fetchall()
    data = pd.DataFrame(a)
    matplotlib.rcParams['axes.unicode_minus'] = False
    fig, ax = plt.subplots()
    ax.plot(data[1], data[0], 'o')
    ax.set_title('NIFTY-50')
    plt.show()



  • Connection is established using connect function from pyhdb package by passing server credentials.

  • We are fetching top 20 records from table NIFTY_50_DATA and converting it into dataframe using DataFrame method from pandas packages.

  • At last scatter plot is displayed using package matplotlib.


Lets test the developed script

  • Run the python script by press F5.

  • Below scatter plot is generated showing variations of days highest price with respect to the date.


Congratulations you have successfully visualized data in python using SAP HANA Cloud Platform.Please note that we can develop perform complex scripts for analyzing the data based on the requirements.
4 Comments
Labels in this area