Skip to Content
Technical Articles
Author's profile photo Aron MacDonald

Under the Hood – Text to SQL with LangChain, LLM’s and Hana

NOTE: The views and opinions expressed in this blog are my own

 

In my recent blog  Data Wizardry – Unleashing Live Insights with OpenAI, LangChain & SAP HANA I introduced an exciting vision of the future—a world where you can effortlessly interact with databases using natural language and receive real-time results.

 

The process involves LangChain, OpenAI, and Hana working together to unlock the potential of natural language based data analysis.

 

Magic revealed

Now, let’s explore further to demystify some of the magic. Let’s focus on what we asked the LLM to do and the risks it poses to Enterprises.

LangChain’s initial task was to enhance a simple question like “Which product was our bestseller in Australia in January?” by providing additional prompts. These prompts help the LLM understand the question better and format the response appropriately.

The simplified version of the enriched question, which doesn’t require a database connection is:

Given an input question, create a syntactically correct SAP Hana SQL query to run.

Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

CREATE TABLE zsdbillgdocitm (
	billingdocument NVARCHAR, 
	billingdocumentitem NVARCHAR, 
	soldtoparty NVARCHAR,             # soldtoparty is also the Customer Code e.g.  00003
	soldtopartyname NVARCHAR,         # soldtoparty is also the Customer Name e.g.  Mango Magic Corp
	salesemployee NVARCHAR, 
	salesemployeename NVARCHAR, 
	country NVARCHAR, 				  #country is the 2 digit code for a country. e.g. United Sates of America is US
	region NVARCHAR, 
	salesorganization NVARCHAR, 
	distributionchannel NVARCHAR, 
	division NVARCHAR, 
	product NVARCHAR, 				  #product is also know as material and in this table are all types of fruit
	plant NVARCHAR, 
	plantname NVARCHAR, 
	billingquantity DECIMAL(13, 3), 
	billingquantityunit NVARCHAR, 
	transactioncurrency NVARCHAR, 
	itemnetamountofbillingdoc DECIMAL(15, 2), 
	companycode NVARCHAR, 
	companycodename NVARCHAR, 
	fiscalyear NVARCHAR, 
	accountingdocument NVARCHAR, 
	fiscalperiod NVARCHAR
)

/*
3 rows from zsdbillgdocitm table:
billingdocument	billingdocumentitem	soldtoparty	soldtopartyname	salesemployee	salesemployeename	country	region	salesorganization	distributionchannel	division	product	plant	plantname	billingquantity	billingquantityunit	transactioncurrency	itemnetamountofbillingdoc	companycode	companycodename	fiscalyear	accountingdocument	fiscalperiod
0010639671	000006	8007	Mango Magic Corp.	00003	Juliet	US	NA	US02	09	39	Banana	PL32	PlantName780564	644.979	KG	USD	8268.04	US02	CompanyCodeName548721	2023	0020000033	004
0010503266	000002	8002	Apple Haven Corp.	00004	Othello	US	NA	US02	04	79	Banana	PL42	PlantName586960	179.979	KG	USD	7902.19	US02	CompanyCodeName55759	2023	0020000057	004
0010704945	000005	8008	Tempting Fruit Co.	00004	Othello	US	NA	US02	99	25	Banana	PL65	PlantName175968	169.48	KG	USD	315.08	US02	CompanyCodeName954840	2023	0020000012	004
*/



Question: "Which was our best selling product in Austrlia during January?"

 

 

Now, I input this enriched question into  ChatGPT   (no API Key required).

Feel free to try it yourself!   What answer does it give you?

 

in a new ChatGPT chat  here is the enriched question:

ChatGPTQuestionPart1

……….

ChatGPTQuestionPart1

 

 

Now, we eagerly await ChatGPT’s response.

Figure%3A%20ChatGPT%20Reponse%20%28Click%20to%20enlarge%29

ChatGPT Reponse (Click to enlarge)

 

Hey Presto!‘ it provides an answer that Oranges were the best-selling fruit in Australia in January.

 

However, there’s a catch—ChatGPT doesn’t have access to my Hana database. It simply made an educated guess. Given that my dataset only includes five fruits, it had a chance of getting it right. But was OpenAI somehow observing my previous API communications to help guess the answer?

 

The fact that ChatGPT provided a completely fictitious quantity of 250.14 sold might offer some reassurance (or maybe not).

 

In reality, the latest data in Hana tells a different story: we actually sold 3346.279 kg of oranges.

Hana%20SQL%20Results%20%28Click%20to%20enlarge%29

Hana SQL Results (Click to enlarge)

 

Moreover, consider the scenario where a user enters a harmful question like “How do I delete all the billing information?”

Here’s what you might get back:

ChatGPT%20Risky%20Response%20%28Click%20to%20enlarge%29

ChatGPT Risky Response (Click to enlarge)

 

Well, ChatGPT attempted to answer it… and that’s concerning if this response was connected to a process that executed it and we’d inadvertenly given the user authorisation for deletions.

 

The Challenges

An LLM lacks knowledge of your specific data structures and the appearance of your data. Without that knowledge, it relies on guesswork.

So, while it’s possible to enrich the question with enough context to make it work, there are trade-offs.

Sharing table structures, sample data rows, and business annotations was necessary in my case. But would you be comfortable doing the same with your enterprise information?

Scalability is also a factor. If you only have a few tables, it might be practical and efficient to use an LLM.

But when dealing with solutions like S/4, consisting of thousands of views with numerous columns, sharing metadata for every question becomes impractical, introduces latency and cost.

Would you expose your business questions and query responses to an LLM?

What sensitive information might those API communications contain?

While Azure’s OpenAI offering addresses some concerns, others require careful consideration when developing a custom solution.

Open LLMs are rapidly advancing, as the underlying transformer architecture remains relatively similar. Organizations with the Datasets, GPUs, and skilled engineering teams may find this a viable option today.

Perhaps in the future, LLMs pre-trained on SAP-centric data structures and easily fine-tuned for unique business questions will become more mainstream.

 

In the following blog I demonstrate how fine-tuning an LLM on Enterprise data has some potential.

Into the SQL weeds – Fine-Tuning an LLM with Enterprise Data

 

I welcome your thoughts on additional risks and concerns for Enterprises in the comments below.

 

SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster’s ideas and opinions, and do not represent SAP’s official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.