SAP Data Intelligence as Optimization Platform
Below article briefly talks about what entails optimization and its widespread reach across day-to-day users and enterprises.
Optimization techniques (or prescriptive analytics) are used to determine optimal solution (best decision) to a problem, formulated in form of (linear/non-linear) mathematical “Objective function” given the set of “Decision Variables” bound by a list of (linear/non-linear) mathematical “Constraints”. Modern day optimization problems can have several million decision variables and are typically solved using sophisticated “Solvers” such as IBM CPLEX, Gurobi or FICO Xpress. Most machine learning algorithms solve an optimization problem internally using methods like “Gradient Descent”, thus optimization has become an important part of analytics-based solutions in Enterprises.
Optimization Techniques can fall under following categories:
1) Linear Programming
2) Integer Programming (or Mixed Integer Programming)
3) Goal Programming
4) Non-Linear Programming
5) Meta-Heuristics Programming
In rest of the article, I will show how SAP Data Intelligence provides end-to-end orchestration platform to host, compile, solve and execute an optimization problem that could fall in categories listed above.
As a sample scenario, I am considering a Diet Optimization problem. The objective is to get the cost optimized balanced diet, based on list of ingredients and cost of each ingredient. We will see Step-by-step on technically implementing this Optimization problem which falls under Mixed Integer Programming (MIP) Problem in SAP Data Intelligence.
So let’s get started: –
SAP DI ML Scenario Manager can be used to create a single Dashboard where all artifacts related to an Optimization problem can be stored and managed through their end of life. Artifacts like data-sets: Datasets (or) Model Formulation file (in LP format), Jupyter Notebook for experimentation, Pipelines for optimization solvers producing solver output.
Go to SAP DI Launchpad and click on ML Scenario Manager as highlighted in screen shot below:
Click on ‘+’ button, give scenario a name like ‘Optimization_CPLEX’ and hit Create button:
Registering the Data-set: We can start of Registering the ‘Diet_Optimization.csv’ Dataset to the scenario. Click on ‘+’ button under ‘Datasets’ tab:
Setting-Up Jupyter Notebook: We can create a new Jupyter Notebook via ML Scenario Manager and use it for writing experimental python optimization script.
To create Jupyter notebook instance, Go to Jupyter Notebook tab and click ‘+’ button, enter name for the new notebook and hit ‘Create’.
We first start with installing required CPLEX and Do-CPLEX Python Optimization libraries:
Initialize the Mixed-Integer-Programming (MIP) cplex model for solving:
import docplex.mp.model as cpx import pandas as pd df = pd.read_csv("diet_optimization.csv",nrows=64) df.drop("Serving Size",axis=1,inplace=True) # Defining CPLEX Mixed Integer Programming Model opt_model = cpx.Model(name="MIP Model") food_items = list(df['Foods']) print("List of food items considered, are\n"+"-"*100) for f in food_items: print(f,end=', ')
Objective is to create Mathematical formulations with ‘Decision variables‘ bounded with ‘Linear Constraints’ and define “Minimization” Cost Objective function. Details of MIP Optimization mathematical formulations can be viewed in LP file attached in artifacts folder at the end of the article.
# Create a dictionary for all food items for each "decision variable" and append to a data frame new_col_names = [ x for x in df.columns if x != 'Foods'] output = pd.DataFrame() for col in new_col_names: temp = dict(zip(food_items,df[col])) output = output.append(temp, ignore_index=True) #Adding Decision Variables to MIP Optim Model food_vars = opt_model.continuous_var_dict(food_items,name="Food") # Establishing Constraints output['Constraint_params'] = new_col_names # Setting minimum and maximum constraints in dataframe output['Minimum'] = [0,1500,30,20,800,130,125,60,1000,400,700,10] output['Maximum'] = [0,2500,240,70,2000,450,250,100,10000,5000,1500,40] # Adding constraints to MIP Optim Model for i in range(11): opt_model.add_constraint(opt_model.sum([output.loc[i+1,f] * food_vars[f] for f in food_items]) >= output['Minimum'].values[i+1], output['Constraint_params'].values[i+1] + 'Minimum')# "Minimum Constraint") opt_model.add_constraint(opt_model.sum([output.loc[i+1,f] * food_vars[f] for f in food_items]) <= output['Maximum'].values[i+1], output['Constraint_params'].values[i+1] + 'Maximum')# "Minimum Constraint") # Add Objective Cost Function objective = opt_model.sum([output.loc[0,i]*food_vars[i] for i in food_items])
Below image shows the declaration of objective function as “Minimization” function and runs CPLEX Solver to “solve” the formulations.
# for minimization opt_model.minimize(objective) # Run CPLEX Solver for Minimization - MIP opt_model.solve()
O/p of above Solver’s exercise can be viewed as below:
Optimal Objective Function value:
Optimal Decision Variables Value:
Next section of the blog addresses specific asks from customers on hosting Optimization solution on SAP DI:
1. Can SAP DI host 3rd party Optimization Solvers like IBM CPLEX and avail its run-time seamlessly?
2. Solve optimization Model specified in file formats like (.lp / .mps ) using CPLEX run-time in SAP DI.
3. Can SAP DI support execution of Optimization Solutions developed (in-house) in multiple programming languages?
4. Multi-users concurrency for Optimization Applications in SAP DI and achieve user parallelization? (not discussed this article)
5. Scalability to run multiple (easy/medium/hard) application concurrently in SAP DI and performance monitoring using Kibana/Grafana DI services? (not discussed in this article)
Lets proceed to see how SAP DI addresses above scenarios:
1. Addressing 1st ask,
SAP DI presents BYOL – Bring Your Own Language Runtime:
SAP DI provides robust and scalable BYOL platform which allows to install Solvers like IBM CPLEX using docker containerization capability. (Similarly, BYOL can help enable other solvers like Gurobi / FiCo Express / Mosel also on SAP DI platform.)
Steps to install IBM CPLEX Solver in SAP DI are as follows:
- To achieve below exercise, community version of IBM CPLEX (compatible for Linux environment) was downloaded from IBM CPLEX site. Steps for installation of licensed CPLEX version remain same with few additions to Docker file (highlighted in relevant section).
IBM CPLEX Community Edition Linux Binary looks as below:
- We intend to invoke ‘silent installation’ for CPLEX software on SAP DI. Hence a file like ‘silent.properties’ is created and maintained with required configurations as shown below :
- Next step is to create docker file in SAP DI which can then be tagged to relevant operator while developing the execution pipeline. It’s a 2-Step process:
Bring required dependencies (‘COSCE129LIN64.bin’ and ‘silent.properties’ file in this case) in SAP DI by attaching them to VREP repository under ‘dockerfiles’ folder. Looks as below in SAP DI Modeler:
Second Step is to create required docker file with relevant tags in SAP DI. For installation of licensed version one of the ways is to incorporate s/w license in Docker file (as applicable and highlighted).
Tags File: ‘byolcplex’ is the custom tag for the docker file.
Above process leverages BYOL capability to host 3rd party software like IBM CPLEX and enable their run-time. We will see next how the CPLEX run-time can be used directly in SAP DI henceforth.
2. Addressing 2nd ask,
Using BYOL capabilities, an optimization problem modeled in ‘.LP’ file format can be solved directly in SAP DI by invoking CPLEX run time.
Steps as below:
- Create a simple pipeline using out-of-the-box Command Executor. CPLEX run-time environment can be triggered via command-line of ‘Command Executor’ Operator as highlighted in screen-shot. Invoking CPLEX run-time via command-line along with passing the required file argument as ‘cplexscript’
- Command executor is grouped and tagged with the Docker file created in Step 1 with custom tag ‘byolcplex’.
- ‘Cplexscript’ is the Custom Optim Script (created and attached in file System Management VREP) and invokes the LP file with Optimization Model as coded in the script.
Depending on 3rd party solver in consideration, optimization model can be coded in different file formats (like MPS, SAV, ANN, CSV, BAS, FLT, MST, NET, ORD, PRM, SQL, VMC, XML). We can pass the same in SAP DI command line as shown above.
- Save and Run the pipeline.
- Once the pipeline is in running state, we can view Optimization output in output Terminal.
3. Addressing 3rd ask,
Can SAP DI execute statically compiled optimization binaries (originally coded in C++ or Python)
For Customers who are largely Windows machine users and have their optimization code base on windows platform, it’s required for them to recompile their raw optimization code in Linux environment to execute in SAP DI. It was also observed that preferred programming languages for coding optimization problems are C++ and Python. Hence one of the pre-steps is to get the re-compiled C++ & Python Windows based code and generate static Linux binaries.
Steps as follows:
- Re-compile raw C++ / Python code in Customer’s Local Linux environment (static compilation). Extract the same in ‘.tar.gz’ format like as below.
- Bring required dependencies (Compiled Binaries and Dataset) in SAP DI and create relevant docker file for execution:
Sample Docker file in SAP DI:
- A simple pipeline is created using “Command Executor” operator. Command executor is Grouped and the same is tagged with the docker built above. Highlighted portion shows the command-line invocation of statically compiled binary:
- Save and Run the pipeline.
- Once the pipeline is in running state, we can view Optimization output in output Terminal.
I have consolidate all docker commands in a single docker file which can be found below:
FROM opensuse/leap:15.1 RUN zypper --non-interactive update && \ # Install tar, gzip, python, python3, pip, pip3, gcc and libgthread zypper --non-interactive install --no-recommends --force-resolution \ make \ java-1_8_0-openjdk \ vim \ tar \ gzip \ ncompress \ python3 \ python3-pip \ gcc=7 \ gcc-c++=7 \ libgthread-2_0-0=2.54.3 RUN python3 -m pip install numpy RUN python3 -m pip install pandas RUN python3 -m pip install tornado==5.0.2 RUN python3.6 -m pip install sklearn RUN python3.6 -m pip install docplex RUN python3.6 -m pip install cplex RUN python3.6 -m pip install xlrd RUN python3.6 -m pip install ortools WORKDIR /home/vflow ENV HOME=/home/vflow ENV WORK=$HOME/work #cplex installation RUN mkdir -p $HOME/opt/ilog COPY silent.properties $HOME/opt/ilog/silent.properties COPY COSCE129LIN64.bin $HOME/opt/ilog/COSCE129LIN64.bin # Give Required accesses and start installations RUN chmod +x $HOME/opt/ilog/COSCE129LIN64.bin RUN chmod 777 $HOME/opt/ilog/COSCE129LIN64.bin RUN chmod 777 $HOME/opt/ilog/silent.properties RUN $HOME/opt/ilog/COSCE129LIN64.bin -f $HOME/opt/ilog/silent.properties ## Getting Pythin Static Binary within DI Docker RUN mkdir -p $HOME/opt/pythonstaticbinary # Set initial environment variable to the directory we created ENV BINARYDIR $HOME/opt/pythonstaticbinary RUN chmod 777 $BINARYDIR/ #### TEST_Main.tar.gz could be any C++ Python statically compiled ADD TEST_Main.tar.gz $BINARYDIR/ COPY diet_optimization.csv $BINARYDIR/ RUN chmod +x $BINARYDIR/TEST_Main RUN chmod 777 $BINARYDIR/TEST_Main RUN chmod +x $BINARYDIR/diet_optimization.csv RUN chmod 777 $BINARYDIR/diet_optimization.csv ## Add the license file to the bin directory # ADD xpauth.xpr $HOME/opt/ilog/bin RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow USER 1972:1972
Above article shows how SAP DI can present a single platform to solve different Enterprise Optimization problems and can host different 3rd party solvers within platform. In subsequent blog, I will be covering the scalability, multi-user concurrency and resource consumption analysis in context of same optimization problem using Kibana and Grafana services that SAP DI offers.
Excellent ! Very detailed and nicely explained blog
Keep it up ..
Great!! Looking forward to the second part ?
Hi, very nice explanation of how to use CPLEX. Can I maybe just asked do you have a link to the dataset "Diet_Optimization.csv". My apologies if I missed it on the page. 🙂
I need also. Where can we find the dataset “Diet_Optimization.csv“?
Amazingly explained thank you for sharing.