Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
raymond_yao
Associate
Associate
We may face a situation that the excel files are located in many different folders and we need to merge them into one table and do some analysis. In this blog post, I'd like to share my experience that we can use Python machine learning client for SAP HANA to do this job in a very convenient way.

In my case, I have sensor data for different devices recorded day by day and stored in many folders.



There are 1010 files. It's impossible to do it manually. We need write a few scripts. I recommend to use Python machine learning client for SAP HANA (https://pypi.org/project/hana-ml/). It will do the table type conversion automatically and supports pandas input. Firstly, let's get the file name list and store in a variable called files.
import os

path = './'

files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.xlsx' in file:
files.append(os.path.join(r, file))


In my case, we only care about the columns "DEVNUM", "RTIME", "FTPTFSPN", "FTPTFFMI". For "FTPTFSPN", we need store it in "VARCHAR(100)". We will store all the data into the SAP HANA table called "PDMS". To use the append mode, we set drop_exist_tab=False in create_dataframe_from_pandas() function.
for file in files:
df = pd.read_excel(file, header=1)[["DEVNUM", "RTIME", "FTPTFSPN", "FTPTFFMI"]]
hana_df = create_dataframe_from_pandas(conn, pandas_df=df, table_name="PDMS", drop_exist_tab=False, table_structure={"FTPTFSPN": "VARCHAR(100)"}, batch_size=50000)

Now, I can use the dataframe hana_df to do the analysis. In my case, I use distribution_fit() and cdf() function to plot survival curve based on the data.


 

We can also fetch the data from SAP HANA table into a single csv file.
hana_df.collect().to_csv("my_data.csv")

 

Python machine learning client for SAP HANA not only provides user-friendly machine learning interface but also the useful functions to import data into SAP HANA table.

 

If you want to learn more about hana_ml and SAP HANA Predictive Analysis Library (PAL), please refer to the following links:

Outlier Detection using Statistical Tests in Python Machine Learning Client for SAP HANA

Outlier Detection by Clustering using Python Machine Learning Client for SAP HANA

Anomaly Detection in Time-Series using Seasonal Decomposition in Python Machine Learning Client for ...

Outlier Detection with One-class Classification using Python Machine Learning Client for SAP HANA

Learning from Labeled Anomalies for Efficient Anomaly Detection using Python Machine Learning Client...

COPD study, explanation and interpretability with Python machine learning client for SAP HANA

Identification of Seasonality in Time Series with Python Machine Learning Client for SAP HANA

Weibull Analysis using Python machine learning client for SAP HANA