Skip to Content
Technical Articles
Author's profile photo Carlos Basto

Harnessing the Power of ChatGPT and SAP HANA Cloud for Next-level Data Analytics

ChatGPT is a powerful natural language processing model that is fine-tuned from the GPT-3.5 series. It was trained on an Azure AI supercomputing infrastructure to provide users with high-quality text generation and understanding capabilities.

SAP HANA Cloud is a single database as a service (DBaaS) foundation for modern applications and analytics across all enterprise data. When used in combination with ChatGPT, it’s possible to get insightful queries based on ChatGPT’s ability to “remember” previous conversations. However, it’s important to note that ChatGPT’s memory is limited and it can only reference up to about 3000 words or 4000 tokens from the current conversation.

Our objective is to use ChatGPT to directly run queries on SAP HANA Cloud in an automatic way. At the moment, there is no official API for ChatGPT, so some non-official packages are used (mainly pyChatGPT – an unofficial Python wrapper for OpenAI’s ChatGPT API). The idea is to demonstrate the potential of future integrations between ChatGPT and the SAP Business Technology Platform (BTP), specifically the SAP HANA Cloud.

In order for ChatGPT to suggest things to us, we need to provide it with some background information. In this case, we’ll present it with two table definitions that we have in SAP HANA Cloud. This way, it will have a better understanding of the data and be able to generate more accurate and relevant queries.

 

Throughout this post we will explain how to relate SAP HANA Cloud and ChatGPT, but for now we will let ChatGPT explains what it “understood” from the background offered.

 

Here, we are having a conversation with ChatGPT and it may not seem related to SAP HANA Cloud or python integration. However, to achieve our goal of generating queries for SAP HANA Cloud, it’s more efficient to receive ChatGPT’s response in python. This way, we can easily integrate it with the rest of our code and work with the data in a structured way.

To do this, we’ll use the Chatbot class, which allows us to interact with ChatGPT in a more convenient way. By doing this, we can create multiple instances of the Chatbot class, each with its own session token and API object. This way, we can easily get responses from ChatGPT in python. And If you wonder how this is done, I’ll be happy to provide some code and walk you through it, step by step.

The Chatbot class is used to interact with a chatbot that is powered by the ChatGPT model. It allows to create multiple instances of the chatbot, each with its own session token and API object.

The class contains two main methods:

  • __init__(self): The constructor method which Initialize the bot behavior and session token by reading files and create an API object that will be used to interact with the Chatbot
  • create_interaction(self, prompt, script=True): which takes prompt text and optional script flag, and use the self.api to send the message and receives the message, it will also clean the response text before returning it.

It also contain other helper method to help with token usage:

  • get_session_token(self, file_name): to read session token from a file
class Chatbot:
    """
    Chatbot class contains the get_session_token and create_interaction functions as methods.
    The session_token variable is now an instance variable, and the api variable is created
    in the constructor. The create_interaction method uses self.api instead of the global
    variable api, and the get_session_token method now uses self instead of the global variable
    session_token.

    This way, you can create multiple instances of the Chatbot class, each with its own session
    token and API object.
    """

    def __init__(self):
        """
        Define chat GPT behavior as SQL Console for returning
        SQL statements only. Otherwise, some further work must
        be done before send to SAP HANA.
        """
        self.bot_behavior = self.get_session_token("behavior.txt")

        """
        Get session token obtained from cookies, check file.
        Set api from pyChatGPT package based on that token.
        """
        self.session_token = self.get_session_token("session_token.txt")
        self.api = cGPT(self.session_token)

    def get_session_token(self, file_name):

        file_path = Path(file_name)

        if not file_path.is_file():
            print("Error: {} not found".format(file_path))
            return None
        try:
            with open(file_path, "r") as file:
                session_token = file.read().strip()
        except:
            print("Error: Unable to read {}".format(file_path))
            return None

        return session_token

    def create_interaction(self, prompt, script=True):
        def set_query(string):
            string = string.replace("`", "")
            string = string.replace("\\", "")
            string = string.replace("\n", " ")
            return string

        if not script:
            response = self.api.send_message(prompt)
            message = response["message"]

        else:
            response = self.api.send_message(self.bot_behavior + prompt)
            message = set_query(response["message"])
        
        self.response = response
        return message

 

The SAPHANA class provides an easy way to connect to a SAP HANA database. It uses a connection file that contains the properties needed to establish a connection, such as the address, port, user, and password. It has the following methods:

  • __init__(self, connection_file): which initializes the class and takes a connection_file parameter that is used by the get_connection function.
  • connect(self): calls the get_connection function and returns the connection object.
  • get_connection_id(self): returns the connection ID obtained from the conn.get_connection_id() method.
  • get_current_schema(self): returns the current schema obtained from the conn.get_current_schema() method
  • close_connection(self): closes the connection to the database

It also has a internal helper function get_connection that handle the logic of opening the file, read properties from it and establish the connection and print some information about connection.

class SAPHANA:
    def __init__(self, connection_file):
        self.connection_file = connection_file
        self.conn = None

    def get_connection(self):
        """
        This function uses the settings provided in the access
        file for connecting to SAP HANA database.
        """
        start = time.time()
        with open(self.connection_file) as file:
            conn_properties = load(file)
            self.conn = dataframe.ConnectionContext(
                conn_properties["address"],
                conn_properties["port"],
                conn_properties["user"],
                conn_properties["password"],
            )

            msg = (
                "INFO: Connection successfully established to SAP HANA."
                if self.conn.connection.isconnected() == True
                else "ERROR: Not possible to connect. Check access file information."
            )
            print("{}".format(msg))
            end = time.time()
            print("      Elapsed time: {:.2f} seconds".format(end - start))
            print("\033[92m      MESSAGE: Logged with connection ID {} in schema {}.\033[00m".format(
                self.conn.get_connection_id(),
                self.conn.get_current_schema())
            )
    
    def connect(self):
        self.get_connection()
        return self.conn

    def get_connection_id(self):
        return self.conn.get_connection_id()

    def get_current_schema(self):
        return self.conn.get_current_schema()
    
    def close_connection(self):
        self.conn.close()

Example of use:

saphana = SAPHANA('path/to/connection_file')
conn = saphana.connect()
conn_id = saphana.get_connection_id()
schema = saphana.get_current_schema()
#... do something with the connection
saphana.close_connection()

Now you know how to integrate ChatGPT with Python and use it to generate queries for SAP HANA Cloud, including for on-premise connections.

As an example, we asked ChatGPT to “based on previous tables in SAP HANA, write na Analytical Query on table TS10“. This means that we were asking it to create a query using only the TS10 table, nothing more. And the response looks great and ready to be sent to SAP HANA Cloud through hana_ml and executed. This shows how ChatGPT can help us create specific and relevant queries for our data. And it could save a lot of time by avoiding the need to write it by hand.

 

It’s great that we were able to get a response from ChatGPT and use it to generate a query for SAP HANA Cloud. However, it’s important to note that ChatGPT did not return the exact query to be executed in SAP HANA.

We used a method called set_query to clean up the response, replacing some special characters, but it does not guarantee that ChatGPT will only respond with a SQL statement. In order to guide ChatGPT to only respond with SQL scripts, we used a variable called bot_behavior which is essentially a text file containing key statements.

The text file contains instructions for ChatGPT such as “I want you to only reply with the SQL Condition output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless instructed to do so.” This text was based on some good community contributions.

When we wrote our query and set the parameter “script” to True, the method automatically included the text from bot_behavior as a prefix to the query.

While this approach is a good way to guide ChatGPT to provide SQL scripts, it’s not a 100% reliable way to guarantee that ChatGPT will only respond with SQL scripts and nothing else. It’s possible that ChatGPT will respond with comments and/or explanations, which can break the code. We hope that OpenAI will release an official API for ChatGPT in the future, which would eliminate this issue.

As for more complex queries, such as those with multiple table joins, ChatGPT can write them as well. It’s just a matter of providing it with the necessary background information and guiding it with the right instructions as we did before.

 

Yes, ChatGPT can generate more complex queries, such as those that involve multiple table joins. As an example, the prompt given to ChatGPT was “Could you join both tables on a more complex analytical query” and as you can see from the picture shared, it was able to join the tables and generate a valid query.

As for dispersion measurements and statistical measures, ChatGPT is capable of generating those as well. By providing it with the necessary background information and giving it clear instructions, it can create specific and relevant queries.

 

It’s also worth noting that ChatGPT can generate more general and insightful queries without any specific guidance, As an example, when asking “Can you write a very insightful query for analytics?” , the chatbot was able to come up with an insightful query by itself.

 

It looks like in this specific case, ChatGPT wasn’t able to generate an insightful query without specific guidance. However, it’s worth noting that ChatGPT’s ability to understand and respond to prompts can vary depending on the complexity of the task, the background information provided and the quality of instructions.

It’s always a good practice to provide some level of guidance to ChatGPT, by giving it background information and clear instructions. It is not a perfect tool and there will be times when it might not understand the prompt or give an inaccurate answer, but with the right guidance it is possible to increase the chances of getting a relevant and useful query.

correct your previous query, it’s wrong “‘column ambiguously defined: CHANNEL”
correct your previous query, it’s wrong “invalid column name: TS10.CHANNEL”

And then:

 

 

Overall, the integration of ChatGPT and SAP HANA Cloud can significantly enhance the capabilities of both technologies. SAP HANA Cloud, with its efficient data management and analytics capabilities, can provide ChatGPT with the necessary background information and context to generate highly relevant and accurate queries. In turn, ChatGPT’s ability to understand and respond to natural language prompts can make the process of querying data on SAP HANA Cloud more intuitive and effortless. As the technologies continue to evolve and improve, this integration has the potential to unlock new and advanced analytics capabilities.

If you’re interested in exploring the codes used in this project, they can be found in the GitHub repository located at https://github.com/carlosbasto/chatGPT. Feel free to download, experiment with, and build upon the codes to suit your own needs.

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Charan Kumar
      Charan Kumar

      After establishing a connection to the SAP HANA DB and providing some background data to help ChatGPT, we will ask queries about it. In return, we will receive the query in Python with certain special characters, such as backslashes and inverted commas. How to make the special characters unnecessary when running the query.