Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
pallab_haldar
Active Participant
0 Kudos

In Machine learning,  we learn or teach a model (using an algorithm) by feeding the historical data to train a model and use the trained model on  Actual data to predict or forecast future data or make decisions. Machine Learning is a subset of AI (required coding) without coding you can say.

There are no algorithms used to train and forecast the data. One group is Regression Algorithms. Regression algorithms are a type of machine learning algorithm used to predict numerical values based on input data by find relationships between the input variables and the output variable by fitting a mathematical model to the historical data and apply it actual data.

In this blog we will discuss about one of them i.e. Linear Regression.

Linear Regression :

Linear regression use algorithm that use linear equation to predict the value of a variable based on the value of another variable.  The Value which we want to predict will be our dependent variable and those factors and figures changes that has an direct or straight(linear) impact on the dependent variable called independent variable.  First it established the relationship between the dependent and independent variables , trained the model based on that and finally apply on unobserved values to generate forecast or prediction.

y=b1+b2x+ε   ( ε - Error)

Linear Equation :  A linear equation is an algebraic equation of the form y=b1+b2x where b2 is the Slope or Coefficient  and b1 is the intercept or constant.     

pallab_haldar_1-1707861473652.png

For the above equation for a increase of 1 unit in X(Independent Variable) the dependent variable ( Y) will increase  2 times staring from initial point of 7 ( intercept).

Involving only a constant and a first-order (linear) term, where m is the slope and b is the y-intercept.

Linear Regression example :  

Let's  take a business scenario where a cosmetic company order the raw materials on monthly basis to its production plant. After a case study was conducted by the case manager, it has been found that the ordered material not utilized properly to produce cosmetic items and it impacting the revenue. To mitigate this we use SAP HANA Predictive Analytics Library function (PAL) to predict the pre ordered material to that the utilization is maximum.

Here we will predict the MATERIAL_NEEDTO_ORDER (Dependent Variable) and we found that it depends on the REMAINING_QTY, SHIPPED_QTY, WAREHOUSE,MATERIAL, YEAR and MONTH.

How we can implement it:

We can implement it using below –

  1. Using R.
  2. Using Python API
  3. Using HANA AFL(PALA).

pallab_haldar_0-1712458807440.png

Here we will implement using HANA and see How Predictive Analytics works: 

1. Identify the Parameters in our case: Dependent Variable (which needs to predict):  Here MATERIAL_NEEDTO_ORDER value we want to predict so it is the dependent variable in our case.

Independent Variable (which value change affects the Dependent Variable value): REMAINING_QTY, SHIPPED_QTY , WAREHOUSE,MATERIAL, YEAR and MONTH are the main fields that have an impact relation with MATERIAL_NEEDTO_ORDER quantity means. changing the value of any field will affect the MATERIAL_ORDERED.

2. Identify the Data pattern for the independent and select the proper Algorithm to train the data.

 In our case, the Dependent variable is the target that we want to predict, and it depends on 3-4 parameters. It is about to predict any data based on other values.

Linear regression analysis is used to predict the value of a variable based on the value of another variable. Linear regression is an approach to modeling the linear relationship between a variable, usually referred to as a dependent variable, and one or more variables, usually referred to as independent variables, denoted as predictor vector.  We will use Linear regression. In the first portion, we will the algorithm to train the model using Historical Data.

3. Identify the Data pattern for the independent and select the proper Algorithm to generate forecast data. As we used a Linear regression algorithm to train the data then we need to use Linear regression (Forecast) data to forecast the data.

Linear Regression(Predict) -----> Output is coefficient ---->  Linear Regression ( Forecast)

4. Identify the Historical and Forecast Data period: To predict the Target value, we have selected –Historical Data Cut-Off Date < CURRENT_DATE-30. Forecast Cut-Off Date >= CURRENT_DATE-30. 

5. Data Preparation/Mining: Prepare Input Data format to generate Trained Model using Linear regression. Data preparation and Data Mining are the most challenging parts of the scenario. We need to prepare the data in such a way that satisfies the signature.

pallab_haldar_3-1712859425758.png

 

 

We prepare like below -

1. Create the ID column with combination of MATNR+WAREHOUSE+YEAR+MONTH. But as ID takes Integer only. convert the Material and WAREHOUSE into integer then concatenate it to build an integer field. We used the replace function in HANA for this conversion -

 

 

 

 

 

TO_INTEGER(
REPLACE(
	REPLACE(
		REPLACE(
			REPLACE(
			    REPLACE(
			        REPLACE(
			            REPLACE(
			                REPLACE(
			                    REPLACE(
			                        REPLACE(
			                            REPLACE(RTRIM(MATERIAL),
			                            'I','73'),
			                        'W', '87'),
			                    'F', '70'),
			                'D','68'),
			            'N','78'),
			        'C','67'),
			    'L','76'),
			'M', '77'),
		'R', '82'),
	'S', '83'),
'Q', '81'))

 

 

So ID = MATERIAL+WAREHOUSE+YEAR+MONTH

and REMAINING_QTY, SHIPPED_QTY we will send as separate independent variable. This two variable has a major impact on the prediction field.

Format for Linear Regression –

2.  Identify the tables : 

Identify the table which contain the mentioned fields and create a view for CV_TRAINING_MODEL for Cut-Off Date >= CURRENT_DATE-60 and in the below format -

Data available in the below link -

https://github.com/pallabhaldar/AI/blob/main/Cosmetic_Train.xlsx 

pallab_haldar_0-1712857145130.png

The output of the algorithm will generate the COEFFICIENT which is input to the next Forecast algorithm.

3. Created a view CV_VALIDATION_MODEL to feed input data to the Linear predict algorithm. It takes the coefficient data from Linear training model and predict forecast value. Now we are taking the Forecast Cut-Off Date >= CURRENT_DATE-30 data and feed as input to the Forecast algorithm.

Data available in the below link -

https://github.com/pallabhaldar/AI/blob/main/Cosmetic_Test.xlsx 

pallab_haldar_4-1712859655895.png

pallab_haldar_2-1712859319650.png

4. Create a output table to put the target data into a table .Need to configure the following -

pallab_haldar_0-1712862947838.pngpallab_haldar_1-1712863004546.png

5. After executing the flowgraph it will generate the predicted value  and you will get the MATERIAL_NEEDTO_ORDER .  But to compare with the actual value we need to create a table function which will take the data from this table can compare with the CV_VALIDATION_MODEL to get the actual Material Order data. Join in A.ID=B.ID.

Result :

WAREHOUSEMATERIALYEARMONTHACTUAL_ORDERPREDICTED_ORDER
P001M0112024354.6
P002M012202436648
P003M013202431114
P004M0142024300
P005M0152024375
P006M0162024332.1
P007M0172024301
P008M0182024300
P009M0192024335.3
P010M020202432123.2
P011M021202435248.8
P012M022202432119.22
P013M023202432022.22
P014M024202432827
P015M025202432831.2
P016M026202435654
P017M0272024321.11

In the next session, We will do the same prediction using Python.

Labels in this area