Skip to Content
Technical Articles
Author's profile photo shivam shukla

Query Unstructured SAP Data Using Generative AI…

Generative AI + SAP

Hey Guys,


Hope everyone is doing great!!

This is my next step towards exploration of Generative AI and what all we can do I think this is the second topic which I have picked for learning – Training OpenAI Models on custom data and here we go hope you will like the concept and its implementation.

Here I am presenting my thoughts on Generative AI You know this is the product which is touching almost everything and give value to Pace to the business / development and pushing organizations to come up with solutions which can change the business dimensions.

I am happy and excited to share the new learning experience in this horizon where SAP and GenAI can do wonders together why?


Ans: SAP is the leader in ERP market and the most important thing this is the only place where customers are having business data ( Centralized business data Access ) Hence if we infuse the Generative AI ( AGI ) using business data ( Keeping Data Privacy into account ) then results could be great for sure , Hence I see the pile of opportunities in this domain and trying everyday to come with beautiful solutions which can really give some business values to our customers.


Problem Statement:

  • You got lots of unstructured information and willingness to extract the meaningful information.


  • Feed unstructured information into LLM and fine tune it and use it (OpenAI Provides a very easy and straight forward Approach to solve this complex problem ) No need to build llm from scratch unless you are sure about it

A small pictorial way to show what we are going to do, and this is really exciting what GenAI can do for you let’s see.

Before I go into the detailing of what I have implemented lets ask ChatGPT few questions.

Q1 – What is CAP


Q2 – What is RAP

Q3 – What is Build Apps


Seeing the response from ChatGPT you see the context is unknow to ChatGPT hence it’s not answering the way we want

If you see the response from OpenAI – it does not have any relevant information about these queries hence this custom training comes into Play. Custom training will give you the freedom to feed the data which you want to be answered later so make sure to prepare that data intelligently.

So based on the data fed to your model it will really be answer those questions in proper format.

So lets talk to our BOT which we have developed and lets start with a Handshake.


Example – if you fed the data related to SAP Build Apps then maybe you can ask “What is Build Apps how to use it?”



Question: Entity in CAP and Example

Question: What is RAP and how its helping ABAP Developers?

Combined two topics in one question.

Ques : I requested ABAP 7.5 Topic list.

Response from BOT is on left side and content i fed during training is at right side

Few More questions we have tried:


Questions: Tell me about CAP Development and how to create a CDS Entity in CAP


Ques: What is Query Language (CQL) and share some sample code to implement this? and also if you can list down Path Expressions?


Questions: ABAP SDK For google cloud here I see, I really need to pass ABAP SDK for google cloud not ABAP SDK only hence this is something where more training and tuning is required.


How to start with ABAP CDS ?


I have attached these snaps so that you can get an idea what all topics which you can choose to train the llm Model.


Topics used for Training

  • ABAP 7.4 quick reference guide
  • ABAP for SAP HANA News for SAP NW 7.5
  • ABAP Language News for Release 7.50 ( Fed the data from SAP Blogs around 20 blogs )
  • ABAP_DATA_MODELS ( Guide from SAP Help )
  • ABAP 7.5 Guide (Source SAP Help)
  • About CAP Application Programming Model (CDS / NodeJS / CSN / CQN / CDL / Common Annotations / CDS Design time / Connecting to external APIs etc.) – Source CAP Cloud website
  • ABAP SDK For google cloud (Source google cloud documentation)
  • SAP BTP / SAP BTP Best Practices
  • Domain Modelling in CAP / Core data service
  • SAP Data Warehouse cloud / SAP Datasphere / SAP AI Core / SAP AI Ethics /
  • SAP Fiori / Launchpad
  • SAP RAP (A detailed information Source )
  • SAP gateway foundation
  • SAP HANA Cloud / SAP HANA Text mining
  • SAP CAP Testing / TypeScript / SAP CAP CDS


Collected this information from Various sources just to see how this will work and I find this really useful hence thought to share this information with everyone and really my machine took around 30-45 minutes in completing the training of model so have patience and enjoy learning –  this is the kind my second step towards the exploration of GenAI. I fed around 100+ pdf file and one excel sheet ( this contains only prompt ).


Disclaimer: I used the data from various blogs / sap help portal, and this is only used for learning and upskilling, I am not using this information for any productive scenarios.



So, you see how this custom training is helping in getting the relevant information back from Data Source but what is the key here?

Which model you are using?

What Parameters you have used for training like what is the temperature and what is chunk size and how good your data is ( So if your dataset is fine enough for these conversational AI then I am sure you can get a lot and In a very structured format that’s the power of Generative AI ) if you take an example of just one PDF file then nothing you can just go into the PDF and search for the relevant info but what if you have 10Ks of File and in different formats as well then ?

Then it comes down to custom training on own dataset and data privacy is also one of the main reasons behind this.

GenAI + Customer’s business data = Realtime business insights for better decision making


GenAI can be trained on unstructured data, and it will combined the models and answer your queries in a very Generic way – training on custom data is just adding one more layer in the standard model just to say that use my custom data to answer my questions.


Putting so many screenshots will not serve the whole purpose hence will request you to surely watch these conversations here in this Video I am sure you will love it the reason is we are dealing with our own dataset and getting the idea of Mining the relevant information out of it.


Implementation Details

  • Get your account created at Overview – OpenAI API ( A Paid one )
  • Get your API key to consume APIs/Models
  • You can also explore all the tutorials / playground to get an idea about it how does it work.

Let’s Start implementing it.

  • Get your tool to start with, I am using VSCode , you can use Jupiter Notebook/lab or Colab anything of your choice
  • Go to VS Code ( Make sure you have installed latest Python Library in your Machine )
  • Create one Folder Name ( GPTApp or Anything ).
  • Open your terminal & Start installing these Python Libraries one by one
Pip install llama_index
Pip install gradio
Pip install OpenAI
Pip install langchain.chat_models


  • We are good now and lets add your python exe to your VS Code Python Interpreter path ( ctrl + shift + p ) browse the path and select exe file to run your python code.
  • Create a test python file and add print ( “Hi From Python , Just to test” )
  • Lets Add file and create a folder name “trainingData” – Add all your PDF Files / excels files here only


Add this Code in your python File.


Define Input Parameters for your Prompt.

This may vary depending on your dataset size and the output you expect from Prompt.

Add this code for Prompt helper and LLM Predictor.


We have passed temperature value as 0.5 hence it will try to bring 2-3 similar tokens related to query and low temperature really make sense in this case as likelihood of getting the close response from data if value is high then it may deviate from what you really expect hence be cautions while setting up these parameters.


Define the gradio interface for BOT.

Note: Find the code here at GitHub

GPT Model used for training. ( this model has also inbuilt function calling which can be used to get current weather and sending emails to someone  ) – for more information go to open AI API Documentation.


Go to terminal and run command python ( it will take around 30-40 mins to run ) based upon the training Data , if it is just one file then it will complete in 3-5 mins and you can chat with the BOT.


But after 30 minutes of training, I had this Conversation with Bot, watch this awesome conversation with BOT and let’s uncover what more we can do with this.


Stay tuned for more content from me as I will not stop exploring these topics and will try my best to contribute to community.

Will appreciate your comments and feedback.

GitHub Link –


Thank You!!

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.