Skip to Content
Technical Articles

Import multiple excel files into a single SAP HANA table

We may face a situation that the excel files are located in many different folders and we need to merge them into one table and do some analysis. In this blog post, I’d like to share my experience that we can use Python machine learning client for SAP HANA to do this job in a very convenient way.

In my case, I have sensor data for different devices recorded day by day and stored in many folders.

There are 1010 files. It’s impossible to do it manually. We need write a few scripts. I recommend to use Python machine learning client for SAP HANA (https://pypi.org/project/hana-ml/). It will do the table type conversion automatically and supports pandas input. Firstly, let’s get the file name list and store in a variable called files.

import os

path = './'

files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
    for file in f:
        if '.xlsx' in file:
            files.append(os.path.join(r, file))

In my case, we only care about the columns “DEVNUM”, “RTIME”, “FTPTFSPN”, “FTPTFFMI”. For “FTPTFSPN”, we need store it in “VARCHAR(100)”. We will store all the data into the SAP HANA table called “PDMS”. To use the append mode, we set drop_exist_tab=False in create_dataframe_from_pandas() function.

for file in files:
    df = pd.read_excel(file, header=1)[["DEVNUM", "RTIME", "FTPTFSPN", "FTPTFFMI"]]
    hana_df = create_dataframe_from_pandas(conn, pandas_df=df, table_name="PDMS", drop_exist_tab=False, table_structure={"FTPTFSPN": "VARCHAR(100)"}, batch_size=50000)

Now, I can use the dataframe hana_df to do the analysis. In my case, I use distribution_fit() and cdf() function to plot survival curve based on the data.

 

We can also fetch the data from SAP HANA table into a single csv file.

hana_df.collect().to_csv("my_data.csv")

 

Python machine learning client for SAP HANA not only provides user-friendly machine learning interface but also the useful functions to import data into SAP HANA table.

 

If you want to learn more about hana_ml and SAP HANA Predictive Analysis Library (PAL), please refer to the following links:

Outlier Detection using Statistical Tests in Python Machine Learning Client for SAP HANA

Outlier Detection by Clustering using Python Machine Learning Client for SAP HANA

Anomaly Detection in Time-Series using Seasonal Decomposition in Python Machine Learning Client for SAP HANA

Outlier Detection with One-class Classification using Python Machine Learning Client for SAP HANA

Learning from Labeled Anomalies for Efficient Anomaly Detection using Python Machine Learning Client for SAP HANA

COPD study, explanation and interpretability with Python machine learning client for SAP HANA

Identification of Seasonality in Time Series with Python Machine Learning Client for SAP HANA

Weibull Analysis using Python machine learning client for SAP HANA

Be the first to leave a comment
You must be Logged on to comment or reply to a post.