Using Generative AI to Bring Enterprise Data Context Without Moving the Data: A Technical Exploration
Generative AI (Gen AI) is a rapidly evolving field with the potential to revolutionize the way we interact with enterprise data. Unlike traditional AI models, which are trained on specific tasks, Gen AI models can be trained to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. This makes them ideal for bringing enterprise data context without moving the data.
One of the key challenges with enterprise systems like SAP is the need to have adequate domain knowledge and build context around the data. For example, a natural language text to SQL query on a large language model (LLM) trained with a general corpus of text may not have the precise functions used by a particular HANA database, or it may not know which fields to use on a particular table or the relationship between two tables.
To address this challenge, we can train a smaller Gen AI model on a corpus of domain-specific text and code. This model can then be used to generate SQL queries from natural language queries that are tailored to the specific needs of the enterprise system. The generated SQL queries can then be executed directly on the database to return the desired results.
This approach has several advantages. First, it eliminates the need to move the data to a central location, which can be time-consuming and expensive. Second, it allows users to query the data in their own words, without having to learn SQL. Third, it can help to improve the accuracy of queries by incorporating domain knowledge into the model.
One example of how this approach can be used is in the area of sales reporting. A salesperson could use the model to generate a report on all sales that have been made in the past quarter, or to generate a list of all customers who have purchased a particular product. The salesperson could also use the model to generate more complex reports, such as a report on all sales that have been made to customers in a particular region, or a report on all sales that have been made by a particular salesperson.
I need to question the LLM models with natural language which should give me a query which should be syntactically correct specific to HANA DB. The open models available do not know how to use the functions that are specific to each database , example data related functions may differ in HANA and Microsoft SQL .
1)Part 1 – Download the LLM model and train with HANA SQL query data
2)Part 2 – Download the Finetuned model and connect to SAP HANA DB to retrieve the result
Part 1 – Download the LLM model and train with HANA SQL query data
In this first step we have used the bigcode/starcoder model to fine tune with HANA SQL query specific to Sales use case . It involves the query related to VBAK,VBAP and MARA tables . Thanks to Aron Macdonald, I have used the blog as reference to build this use case into-the-sql-weeds-fine-tuning-an-llm-with-enterprise-data.
The fine tuning dataset and the code is available in the below github LLM-finetuned-HANA
I have used a V100 resource in Colab to achieve this and it took around 1 hour to achieve the 0.2 loss and between 0.2 to 0.1 the model performs well . We need to train with 3-4 times to achieve this .
I have pushed this fine tuned model to Hugging face so that we could use it later.
Part 2 – Download the Finetuned model and connect to SAP HANA DB to retrieve the result
In this part I have downloaded the fine tuned model from my Hugging face repository and used it to query and executed the result of the question in the HANA Database to retrieve the live results .
Once the model is ready you can use it to question on sales related data and a db connection to HANA will execute the query and retrieve the results
The entire code to download and test is available in this git link test_finetune_hana_SQL
Gen AI is a powerful tool that can be used to bring enterprise data context without moving the data. By training Gen AI models on domain-specific corpora, we can enable users to query the data in their own words and generate accurate and informative reports. This can help to improve the efficiency and productivity of businesses of all sizes.
Enterprises have a vast amount of data that can be used to generate domain-specific knowledge. We can leverage vector databases and embeddings to bring this domain-specific knowledge to large language models (LLMs). For example, we could store all of the metadata in a vector database and use it in conjunction with fine-tuned LLMs. As LLM models continue to evolve, we can consider bringing in even more metadata.
In other words, vector databases and embeddings can be used to create a bridge between the domain-specific knowledge stored in enterprise data and the general-purpose knowledge stored in LLMs. This can allow LLMs to generate more accurate and informative responses to queries that involve domain-specific knowledge.
Note : SAP disclaims any legal obligation or commitment to pursue any course of business, or develop or release any functionality, mentioned in posts about potential uses of generative AI and large language models on this website. These posts are merely the individual poster’s ideas and opinions, and do not represent SAP’s official position or future development roadmap.