Image Source: Intercom (http://bit.ly/1qpJfzl)
Building Chatbots for Enterprise Applications
The Rise of the Framework
Who let the bots out, who who who who?
Similar to the lyrics of BahaMen, one might ask where did all the bots come from and how can they help me? Bots or conversational interfaces are omnipresent these days, especially with the recent announcement of Facebook on their bot platform which is primarily targeted at the consumer space. Due to the increasing popularity of messaging platforms like WhatsApp, WeChat and Facebook Messenger as the preferred form of interaction and communication, the concept of having a helper that can order an Uber or a pizza without even leaving the chat environment becomes very attractive.
Bots can offer a wide range of benefits to end users in the consumer and the enterprise software space by making them more productive and delight them with a natural user experience. Of course, someone has to build all these bots and guess what: this falls back to us developers. Yet, no need to despair! Just like with any new trend in software, frameworks and libraries will make our lives much easier.
Right now, we are experiencing the start of an exciting race for the best bot framework. The goal is to attract the greatest number of developers and provide the most powerful capabilities. Today’s developers are used to frameworks enabling them to produce code and build products at a faster pace. They increase the code quality and reduce the likeliness of duplicating the same work, allowing developers to concentrate on the specific requirements of their application and not on the low-level details.
YABF (Yet Another Bot Framework)
As part of our job at the SAP Innovation Center Silicon Valley, we set out on the journey to look at available bot and NLP (natural language processing) frameworks to understand how well they support pilot use cases. A simple scenario that we wanted to build is a lightweight bot that assists guests at HanaHaus (a SAP owned cafe and coworking space in downtown Palo Alto, California) with making space reservations. This bot would also answer questions like “When is HanaHaus open?” – a relatively simple FAQ functionality.
In addition to missing features we saw while working on this scenario, this post highlights the strengths and weaknesses of current bot frameworks, not intending to rate the best bot frameworks. We will also cover what is needed if they are employed in the enterprise software context and related applications.
Types of Bot Frameworks
Although the hype around chatbots based on NLP is still young, depending on the use case, developers can already choose amongst a wide variety of:
- Messaging Apps
Bot development platforms integrated into existing messengers (e.g. Telegram, Facebook Messenger, Slack).
- Natural Language Frameworks
Standalone NLP frameworks which can be used to natural language-enable any application via APIs as well as to build own chatbots (e.g. Facebook’s Wit.ai, Api.ai, Microsoft LUIS).
- Digital Assistants
Solutions like Apple Siri, Amazon Alexa, and Microsoft Cortana are still mostly proprietary black boxes for 3rd party developers. Thus, they unfortunately provide only limited ways for external developers to leverage their full-blown potential. The recent preview on Viv Labs’ AI assistant is a promising indication for future AI-powered assistants which enables anyone to add domains and improve results for all users.
In addition, there are players which fall into each of these categories – one example is Kore. It provides a messaging app, an integrated bot platform, and an NLP engine, all of which can also be used independently from each other.
Key Criteria for Choosing the Right Bot Framework
The space of enterprise applications leads to specific requirements for bots which are often different from the consumer space. For example, business scenarios are often more constrained, e.g. by regulations, and they may use a very specific terminology which needs to be understood by the enterprise bot with near-perfect accuracy. In contrast, consumer scenarios that Apple Siri and the like cater to probably require slightly less accuracy, but they have to be able to understand more open ended queries from a wider ranging field of topics (e.g. weather, news, movies).
1. Basic Robust Understanding
To delight business users who deal with high-value business opportunities and time sensitive challenges every day, it is critically important that the enterprise bot you are implementing understands the user’s intents in an almost perfect manner. In the same line, a certain robustness to incorrect spellings and capitalizations is required due to the nature of the text-based user interface.
Yet, most of the bot frameworks we evaluated have little tolerance for spelling mistakes with the keywords and would fail to recognize entities like countries if they are not properly capitalized. The phrases “united states of america” or “united states” will most likely not be recognized as a country entity, whereas “United States of America” would definitely work. In addition, date and time parsing is also tricky. We observed that the frameworks will fail to parse nested queries like “make a reservation for friday in three weeks at noon until 4.”
2. Availability of Pre-Built Entities
Business applications rely on similar data elements at their core, e.g. currencies or date & time. Therefore, it is valuable that the first common strength uniting most frameworks is the fact that they all have pre-built entities. This makes information extraction easy and allows developers to gather information the user entered through a messaging platform or voice input. The common entities provided are date, time, number, location, organization, person, temperature and dimension. With this basic set of entities, developers are able to extract information and feed simple applications. These pre-built entities are a good start for developing consumer applications and they take a lot of work out of developer hands, but might be insufficient in an enterprise context.
3. Adding New Domains, Terminology and Custom Entities
Most business scenarios have a very specific focus, so developers like us need to be enabled by bot frameworks to incorporate our own terminologies, subjects and intents.
To allow our applications to perform actions for the user, we have to train them with intents – operations we want the bot to act on once it recognizes a certain keyword. In our use case with HanaHaus, an intent to call the Reservation API would be like this: “I need to make a reservation,” where the word reservation would be a keyword for many frameworks. The more examples we present, the higher rate of accuracy the bot will provide in understanding users.
Many frameworks use a keyword based approach which makes them prone to errors with possible sentences like this one: “I made a reservation. I would like to cancel it.” Assuming “reservation” as the keyword which triggers a reservation call and “cancel” as the keyword which triggers a cancellation call, this query would result in a deadlock since both keywords are present. A hybrid approach which uses both keywords in addition to understanding grammar could resolve this issue. Regardless of the approach, the way bots are trained can be quite advanced with easy to use UIs in the browser.
4. Comprehension of Complex Queries (e.g. Nested)
Enterprise software applications are often seen as too complex by casual users, but it is also true that many of these applications deal with extremely complex business scenarios, e.g. supply chain applications dealing with millions of parts, product variants and their interrelations.
Therefore, enterprise bots will need to be capable of interpreting similarly complex queries. Yet, the most challenging task that frameworks cannot handle today are complex queries where normal grammar and keyword extractions to identify the intent will not work. Let us assume that we are texting a pizza bot that helps us place an order using this sentence “I like onions, but I would go with green peppers today.” If our approach was keyword based with a few grammar rules, this will most likely result in an order with both onions and green peppers. This can only be solved if bots are able to learn and understand linguistics, which is a very difficult problem.
Complex information extraction from the user input can be achieved through regular expressions. Current frameworks come with a wide range of options to define those regular expressions and help us developers to extract information more easily. Let us look at an example: In our HanaHaus scenario, a guest would like to make a reservation so she texts: “I want to reserve a space for 6 people from…”. Using regular expressions like IN+CD+NN – IN being a preposition, CD a number and NN a noun – we are able to extract the number of people the guest wants to create a reservation for.
5. Dynamic Entity Linking to Enterprise Systems
Another recurring challenge is to understand custom entities. Especially in domains that do not come pre-built with the framework, it is necessary to maintain entities like customer names or movie titles. Feeding those into the model can be a particular challenge if the domain has a vast amount of unique or even constantly changing entities that need to be added. This is a common challenge. Especially in the enterprise software space, very specific terminology is used and needs to be trained as part of the model.
Even more challenging for today’s bot frameworks is the fact that almost all enterprise applications have a dynamically changing data basis. Therefore, the engine to recognize intent and related parameters in user input needs to be able to access this data and learn from it in real-time.
As an example, a user of a bot for sales representatives might reference a customer name which was added by a colleague just a few seconds ago. If the model had only been trained with static input from a while back, the system would not be able to determine that the given entity is indeed a customer name and therefore will miss this crucial piece of information.
6. Memory & Feedback
Enabling the bot to have a memory – an ability to either have a short or long-term memory – is not present in the frameworks we have tested. There are ways to implement such functionality, however these ways are quite cumbersome. Let us look at an example: If we ask a bot: “What is the temperature in San Francisco?” – the bot could respond: “Currently, it is 75 degrees fahrenheit in San Francisco.” If our next question is: “Could you please tell me in celsius,” most frameworks will struggle since they do not maintain a memory and cannot relate to the previous question(s). Additionally, the bot could learn “unsupervised” from this conversation and report the temperature in Celsius in the future. Finally, if a framework could come up with a seamless way through which the user could reinforce the bot’s understanding and beliefs, this would be a remarkable add-on that would evolve the way we work at an exponential rate.
7. Bot Story
One very important strength of modern bot frameworks is their ability to represent complex story flows. With the help of a waterfall model, a developer can express the journey of a user through a conversation for more complex dialogs. Similar to rules, this allows us to define example conversations but has the advantage that there is no need to revisit all the rules once a new conversation is added. Many frameworks provide such an ability to create stories and flows, but creating complex stories requires considerable effort. It would be helpful if frameworks would allow the users to define stories by integrating with tools like Omnigraffle or any such customized solutions. This would give designers the option to control the flows and developers could focus on the implementation details.
HanaHaus Example Waterfall Flow Chart for User-Bots Interaction – No framework allows direct translation into Stories.
8. Omnichannel Access
Consumers primarily use homogeneous ways of messaging amongst their friends and contacts (e.g. WeChat in China, Facebook Messenger or Snapchat in the US, WhatsApp in Europe & India). The situation is different for enterprise users where each company might use different and often more traditional ways (e.g. SMS, eMail or BBM). Thus, to cater to the needs of a business audience it is often preferable to chose a framework that either enables users to interact using a multitude of platforms and devices or on the extreme end of the spectrum, to build custom enterprise bots for the specific technology the target audience is using, e.g. by integrating standalone NLP frameworks.
9. User Experience & Frictionless Interactions
The hype surrounding chatbot circles around the potential of conversational, text-based user interfaces that remove friction in otherwise cumbersome processes and navigational steps users experience with traditional GUIs. But to anyone who has actively used chatbots, it becomes apparent that we need to provide end users with more than that.
Going beyond just enabling end users to interact with software using natural language, e.g. by entering full sentences, business users will want to leverage enterprise bots to act faster and to be more productive than with traditional GUIs. Therefore we need to give users the ability to take shortcuts, e.g. by entering only parts of a customer name or by using solely action words instead of full sentences. Addition apps like Telegram add hyperlinks in conversations so that users can just tap on them to enter the highlighted text as a command.
As a result, object IDs and entities should be referenceable with a simple interaction in enterprise bots, e.g. by tapping on customer names or product IDs. In addition, the messaging UI should provide users with shortcut buttons to relevant activities, like “Create New Opportunity” or “Check Product Status”.
TechCrunch Bot on the Telegram Messaging Platform – Shortcuts on the bottom of the conversation to steer the conversation more effectively
10. Cognitive Capabilities
Enterprise bots that understand user input in a robust manner and respond to queries accurately have tremendous benefits in providing a whole new productive user experience to business users. The true future value however lies in the potential to add cognitive capabilities.
The bot frameworks need the ability to understand business processes in conjunction with the user’s context so tasks can be executed by themselves majority of the time, allowing those to focus on work that matter the most. In our experience, none of the available bot frameworks have reached this goal so far. Some help developers to improve the accuracy of their bot by analyzing user mistakes and feedback, but none are able to comprehend the enterprise business and user context in a truly AI-manner yet.
Overall, bot frameworks provide a great foundation for building NLP enabled applications. However, they are not yet able to solve complex tasks. Unless they mature, more frameworks and their respective bots will live a life that is governed by simple tasks. Right now, this meets my requirements because even the simplest tasks consume my time and I am more than happy to leave the pizza order to a bot. But who knows what today’s simple tasks will be like tomorrow…