What is the best way to learn and write an AI Chatbot? What are the latest research papers that I should read? originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world.
First, understand the goal of the AI Chatbot you’re building. Is it about achieving a high resolution rate 80%+ and delight your users or is it just about launching a bot that provides minimal resolution say 20%? Learn about various approaches and types of chatbots you can build to achieve that goal. Then, determine technology and resources you’ll need. Finally, implement: procure data, design, implement, and experiment for high resolution rates. I’ve added few other critical considerations: integrating with CRMs, determining most effective channels/modalities, and considering humans as back-ups for errors and escalation.
Two types of chatbots: information seeking and transactional
Chatbots accomplish two types of tasks: seek information or complete a transaction. Information seeking bots are relatively easier to build (e.g. accomplishing ‘what’s my account balance?’) than transactional (e.g. accomplishing ‘please change my seat on the second leg of my tomorrow’s flight to an aisle seat.’ or ‘set up automatic payments for my auto insurance but not my home insurance and split monthly payments across a debit and a credit card’). Key considerations to building either type of chatbot:
- Interaction: Natural Language or Directed Dialog
- Design (conversation, visual, and interaction)
- CRM integration
- Channels and modalities
- Escalation to humans
Chatbot interaction: Natural Language/Conversational or Directed Dialog
For either chatbot, natural language can be applied to both input (what’s spoken or typed by the user) and to output (what’s spoken or shown visually by the chatbot). Stefan’s matrix explains this clearly and either machine learning based Natural Language can be applied with satisfactory performance or rule based directed dialog can be implemented with sub-par performance (I’ll get to the reason of poor performance in a bit). Pros of machine learning: delivers higher performance or resolution rates. Cons: requires lots of data and specific skill set (to create, models) and/or tools (to create/tune models).
Directed dialog or rule based approach (no machine learning or AI required) provides limited options to the user for both input and output e.g. “Select one of the following – 1 for appointments, 2 for doctors directory, 3 for insurance details, 4 for other reasons ”. These options can be visually in a design paradigm pioneered by Facebook Messenger. In a directed dialog interaction, the driver of a conversation is a chatbot and not the user resulting in an unnatural, not so conversational and often constraining and frustrating experience as most of us have experienced with traditional phone systems. Most current web and mobile chatbots are directed dialog style. Reasons for low performance for directed dialog or rule based approach:
- Discovery challenge: complex business logic results in large decision trees and users have a hard time discovering their specific question, which happens to be a tiny leaf node lost in a complex decision tree. Often users pick a wrong reason just to get to the next step.
- Too restrictive: users don’t like getting constrained and look for other means to resolve their problems. This is particularly pronounced as users don’t speak the brand jargon.
Success (in terms of transaction completion or resolution) is not satisfactory with either approach (yes, even with Natural Language!) with a pure technology approach as was evident with Facebook Messenger chatbot, which did not achieve more than 30% automation even when AI and Natural Language was applied. Key reasons for poor performance of Natural Language approach:
- Doesn’t take care of all the non-happy-paths that represent reality of conversations
- Users don’t always always say what they mean when posed with “How may I help you”. Partly because users don’t speak the brand language of a specific chatbot/company. For instance, chatbots expect “baggage handling exception” instead of “problem with my check-in suitcase” or “balance transfer” instead of “sending cash from one account to another”. To address this ambiguity, either disambiguation, which is to clarify what the user meant. Alternatively, predictive technologies can be applied to determine the real intent.
- objective changes – humans love to change topics in most conversations e.g. the following journey is not uncommon for credit card users: start with payment status, jump to available credit, then pay off the third credit card and why not check last week’s transactions in-between.
Conversation interaction design: manual approach and pure AI
Conversation interaction design takes care of the above shortcoming of Natural Language. It focuses on
- Design flows for non-happy-paths. Example, I’d like to make a payment and split it across a gift debit card and my credit card. A few steps later actually, I change my mind and would like to split it across two debit cards. This change of mind is particularly where most chat bots fail.
- Objective changes. Example: I’d like to opt-out of paper mail delivery before I make that split payment)
- Verbiage/modality of the response. More often directed dialog response (such as quick replies, carousels, radio buttons) is better for output but not for input. Users like guidance but not constraints.
In reality, human conversations encounter non-happy-paths and objective changes very frequently (evidence: observed in human agent conversations over chat and phone and many 1:1 settings). Conversation interaction design helps you elegantly handle these expected non-happy-paths. It gets complex as you add speech/audio to the mix.
Downside of this manual conversation interaction design is that it is time consuming, requires specific skill set and is manual. When complemented with Natural Language this approach yields very high performance and task completion rates. It also requires specific skillsets: interaction design, content design, and Natural Language skills. If your AI chatbots represent large set of intents this is a sure shot approach to high performance.
Pure AI approach
Generative AI for output and applying machine learning to the entire conversation replaces interaction flow design. However, pure AI approach requires large set of data of real human conversations with both input and outputs for both happy and non-happy-paths. Procuring enough representative corpus of such data is a practical challenge if you don’t already have human conversations taking place. Another challenge is entity extraction for inputs and generation with new entities after interacting with dynamic CRMs/user databases.
Don’t forget to plan integration with CRM (business back-ends and user databases) that determine specific actions and read/write user or journey specific information. Simple weather lookup bot doesn’t care about complex CRM integration but if a bot is required to upgrade your mobile phone data plan , access to your mobile phone provider’s database is required for query, create, update, and delete operations. Very few large enterprises have programmatic RESTful access and this can be a time consuming task when integrating with proprietary CRMs.
Channels and modalities
Determine which channels and modalities you want to associate with your bots. This is driven by the bot users’ preferences, behaviors, and specific vertical/domain the bot is serving. For instance, in an automotive scenario you may want to use speech recognition based interaction (either in-app or via a phone). For millennials, you might want to use Facebook Messenger as a channel. For premium products, Apple Business Chat might make sense given Apple products are more used by higher income individuals.
Humans as back-up or for correct escalations
Humans will always be required for two reasons:
(1) right reason: business policies might mandate human intervention e.g. refund of an expensive flight ticket after an incident
(2) wrong reason: bot doesn’t perform per design
Ensure you have a plan for both (1) and (2).
Finally, you can experiment with various readily available bot services such as Microsoft Cognitive Services, IBM VWA, Wit or Amazon Lex to create a chatbot. Neither are enterprise grade. For enterprise grade chatbot frameworks refer to the latest top 10 report from Forrester. Summary of the above considerations:
Disclaimer: I have built all types of chatbots mentioned above with a ~90% success rate and delight factor in my previous role at 7 Inc., , leading product management for enterprise AI self-service experiences.