How the heck does a chatbot work?

Dan_Wroblewski · ‎10-20-2020

I was perplexed.

I learned about SAP Conversational AI chatbots, even created a few. I read the documentation, blogs, even architectural flowcharts. I created `tutorials and wrote my own blogs. But I couldn't quite get my head around what was happening.

No, the NLP (natural language processing) wasn't the problem, though I truly had NO idea how that worked and it sometimes gave bizarre values for the user's intents and entities. I could live with that. But how did it figure out what skills to run, what path to take, what messages to show.

So I did some testing ... and reading ... and discussing with the smartest people I could find.

Below is the result of my investigation, and an explanation, as far as I could comprehend, of how SAP Conversational AI chatbots work. It is designed for business users to understand how the chatbot decides what to display, and I make no guarantees.

1. Everything starts with an utterance

Everytime something is written into the chat by the user, the entire chatbot process starts.

All kinds of things happen in the Rube Goldberg machine behind the scenes: intents are detected, APIs are called, memory is updated, variables stored, and messages are displayed.

And then we wait for the next utterance and start all over again.

2. NLP determines intents and entities

Now I don't pretend to understand how NLP does what it does, but in the end it produces an intent and one or more entities from each utterance.

You can quickly see what it does for each utterance by testing in the chat preview and viewing the JSON (click the yellow information bubble).

Here's the JSON, showing you the intent and entities found by the NLP:

IMPORTANT: Each utterance produces its own set of intents and entities, and they are not stored after the next utterance. There is no stack of intents, and the previous intent has no direct influence on what happens with the current intent. (Yes, if an entity matches a skill requirement, its value is placed in memory, but the detected entity is forgotten. And yes, a previous intent may have rejiggered the system in some way that affects the chatbot now, but the intent itself is forgotten.)

3a. Select skill from stack?

Now, the action begins.

The chatbot has to select a skill to try to execute, and it starts by deciding whether to take the top skill on the skill stack. Skills are placed on the stack when they are triggered but not executed (for example, because the requirements were not fulfilled). IMPORTANT: During each utterance, the chatbot considers only the skill at the top of the stack.

How does the chatbot decide whether to take it?

Since people can start talking about one thing and suddenly change topics, the chatbot only selects this skill if:

The current utterance contains the skill's trigger.

The current utterance contains an entity that is a requirement of the skill that has not so far been fulfilled.

3b. Select new skill?

If the chatbot decides against the top skill on the stack, it checks all the skills to see if any are triggered by the utterance.

If none are triggered, the fallback is executed.

If 1 is triggered, it's selected (and put on top of the stack).

If more than 1 are triggered, the fallback is executed.

IMPORTANT: A skill cannot be in the stack more than once. If a skill that is already somewhere in the stack is triggered, it is moved to the top of the stack.

4. Execute skill

Once we have the skill to execute, the chatbot:

Checks if the requirements are fulfilled. It may also display messages or run other actions that are defined in the replies of the requirements.

If the requirements are fulfilled, executes the skill's actions.

Once a skill is executed, it is removed from the stack.

A few additional things to note:

A Go To action with "start the skill" puts the skill on the top of the stack and the chatbot executes it (if its requirements are met).

A Go To action with "wait for user input" puts the skill on the top of the stack but the chatbot does not try to execute it.

No Reply merely means that no messages were included in the actions of the skills that were executed. No Reply appears in the chat preview, but each client can decide what to do when no messages are created from the utterance. The fallback skill, in contrast, is executed if no other skills could be selected.

These were things that confused me, and I hope I explained them correctly.