Using Application Function Modeler To Create Forecast (APL) Procedure in SAP HANA
In this blog, I have shared my experience gained while creating forecast procedure using following features of SAP HANA:
- APL (Automated Predictive Library) Forecast Function
- Application Function Modeler (AFM)
I haven’t seen any convincing blog which promotes this feature of SAP HANA. Personally, I believe it is really good feature for people with beginner or no SQL scripting skills as AFM is graphical tool. However, the end result is a procedure which can be used later for different purpose. I created this forecast procedure based on the use case provided by Forefront Analytics. The use case instructed to use APL forecast function. The input dataset was provided by Forefront Analytics. I started my research on APL and my main focus was time series function i.e. “Forecast“. I learnt about APL installation and different APL functions as I progressed. I discovered about AFM when I was watching SAP Technology videos as part of my APL research.
I have tried to document my work in following three sections:
Prerequisites
There are following prerequisites before we start developing:
- SAP HANA (SPS 09)
- SAP AFL (this should be installed as part of SAP HANA)
- SAP APL (you will need to install it separately)
- unixODBC 64 bit (No version was mentioned in guide so I used 2.3.2)
- SAP HANA Script Server should be enabled
You can refer to this blog if you need to install APL.
What is APL Forecast Function?
The forecast operation consists in building and training a time series model, and then applying the model in order to obtain the forecast data. Following table lists the input and output tables required for forecast function to work. Also, there is expected table structure within the description.
Direction
|
Type
|
Description
|
|||||||||||||||||||||||||||||||||
IN
|
FUNC_HEADER
|
Optional (The table must be provided but it can be empty.)
The function header defines the operating parameters of an APL function call. For instance, a function header can be used to set the logging level or to provide an operation id. A function header is made of a collection of string pairs { name, value }. It’s usually the first input parameter of a function. It can be empty.
Expected Table Structure:
Supported Parameter Names:
|
|||||||||||||||||||||||||||||||||
IN
|
VARIABLE_DESCS
|
Optional
The table must be provided but it can be empty. The variable descriptions for the input dataset, as expected by the Automated Analytics engine.
Expected Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
OPERATION_CONFIG
|
Mandatory
The configuration of the training operation.
For this function, you must declare the type of Automated Analytics model and you can declare the optional Cutting Strategy in the OPERATION_CONFIG table as follows:
Expected Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
VARIABLE_ROLES
|
Optional
The table must be provided but it can be empty. The roles of the variables for this training.
When training a model, the roles of the variables can be specified. These variable roles are provided as string pairs {variable name, variable role}.
In data modeling, variables may have four roles. They may be:
●Target variables (also known as output variables): a target variable is the variable that you seek to explain, or for which you want to predict the values in an application dataset. It corresponds to your domain-specific business issue. In some businesses, target variables may also be known as variables to be explained or dependant variables.
●Explanatory variables (also known as input variables): an input variable describes your data and serves to explain a target variable. Explanatory variables may also be known as causal variables or independent variables.
●Weight variables: a weight variable allows one to assign a relative weight to each of the observations it describes, and actively orient the training process.
●Skipped variables: these variables are ignored during the training process
Expected Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
DATASET
|
Mandatory
The name of your input (training) dataset.
Datasets are used for training models and applying models.
Datasets used for training or applying models can contain any number of columns (within HANA and AFL limits). The supported SQL datatypes for the dataset columns are all the datatypes supported by AFL:
●INTEGER
●BIGINT
●DOUBLE
●CLOB & NCLOB
●VARCHAR & NVARCHAR
●DATE, TIME, TIMESTAMP, SECONDDATE
|
|||||||||||||||||||||||||||||||||
OUT
|
DATASET
|
The resulting forecast
|
|||||||||||||||||||||||||||||||||
OUT
|
LOG
|
The training log
The operation log. When performing an APL operation, especially training or applying a model, the Automated Analytics engine produces status/warning/error messages. These messages are returned from an APL function through an output database table.
Expected table structure:
|
|||||||||||||||||||||||||||||||||
OUT
|
SUMMARY
|
The training summary
When training a model, debriefing information related to the training operation is produced. This is known as the training summary. This information is a set of indicators, provided as string pairs { KEY, VALUE }.
Expected table structure:
|
|||||||||||||||||||||||||||||||||
OUT
|
INDICATORS
|
The variable statistics and indicators
When training, testing or querying a model, it’s possible to retrieve variable indicators (i.e variable statistics). For each variable, a collection of indicators may be retrieved. These indicators are described using the following attributes: { variable name, indicator name, indicator value, indicator detail (when applicable) }. Indicators are returned from an APL function through an output database table. The output table contains estimator indicators for regression models, to help plotting the regression curve.
Expected table structure:
|
What is Application Function Modeler?
The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain
- tables, views, and procedures from the catalog
- relational operators such as projection, filter, union, and join
- functions from Application Function Libraries (AFL) installed on your system
- attribute view and calculation view development objects
- the Business Function Library
- the Predictive Analysis Library
- R Scripts
- Data Provisioning operators
- the generation of task plans
- Create Catalog Objects such as schema, source tables, function tables, output tables
- Create Flowgraph
Assumptions:
- XS project has already beencreated. In this example we have used following XS project: APP1
Create Catalog Objects
Schema
- First of all, please create schema definition file named APP1.hdbschema using following method:
- Right click project name and select New > Other > SAP HANA > Database Development > Schema
- Select the folder where you want to save this file and give it a name “APP1”. Please make sure there is no template selected
- Once file is created then open it and enter following
- Right click and activate it. This will create the schema
- Run following SQL statements to grant privileges to _SYS_REPO
grant alter on schema “APP1” to “_SYS_REPO”;
Tables
- Right click project name and select New > Other > SAP HANA > Database Development > DDL source file
- Select the folder where you want to save this file and give it a name. Please make sure you have selected empty structure. Perform this three times and create three files func_data.hdbdd, output_data.hdbdd, source_data.hdbdd.
- Once files are created then open it and enter following:
func_data.hdbdd
source_data
output_data
- Save the files, right click on them and activate. This will create required tables in schema “APP1”. You will see following tables in schema
Creating Flowgraph
Assumption:
Source data table is populated with data
- Right click project name and select New > Other > SAP HANA > Database Development > flowgraph model
- Select the folder where you want to save this file and give it a name.
- Once files are created then open it and right click on empty space to go to properties of this flowgraph model. Please change the schema from _SYS_BIC to APP1
- Add forecast function (drag and drop) from general list in node palette
- Go to function properties and then choose APL_AREA in area and once functions are populated then then choose FORECAST function. Your screen should be like below:
- Next, we will add input table and then map it to corresponding function node. Please select FUNCTION_HEADER from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Source” when system asks you. Now, create a connection from this table to FUNC_HEADER node in function. Please note you can name this each node as per your need.
- Please perform similar for all the input tables and your screen should be like below:
- We will now add output tables and then map it to corresponding function node. Please select FORECAST_OUTPUT from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Sink” when system asks you.
- Before we create a connection between node and table we will need to define the columns in output node of function. Please use similar table structure as we defined above when creating output table in schema APP1. Go to properties of node > signature > add/change columns and its data types
- Now, create a connection from node “Output_Data” to this table FORECAST_OUTPUT. Please note you can name each node as per your need.
- If you are going to truncate table everytime you run procedure then please select following option in node properties.
- Please perform similar for all the output tables and your screen should be like below:
- Please note that in each node there is green tick which means there is no errors
- Now, please activate this model
- Once activated you will see new procedures and table types are created in schema APP1
- To call this procedure please use following SQL statement:
Nice document Angad. Learnt something new.
Please fix the images.
Benedict
Hi Benedict,
Thanks for your comment.
May I know what issue you are facing with images? I am able to view images. Is it the image pixels or something else?
Regards
Angad
I just see blank boxes in the place of images. I tried both IE/Chrome. Maybe its just me. Lets see if others have the same problem.
I can see the images fine in Chrome
Our secure VPN is the culprit here 😐 . It blocks the source url for the images.
nice document
Thanks Visweswaran.
I hope it will be helpful someday.
Cheers
Angad