Each person has a unique style to communicate. So, the dialogue management is the key challenge of a chatbot to ensure communication to be contextually and semantically appropriate. This is what differentiates a chatbot from a static IVR system. To generate intuitive user experiences, chatbots have been relying on machine learning techniques like Reinforcement Learning. In this post, we study how reinforcement learning transforms ordinary chat interfaces to intelligent bots.
What is Reinforcement Learning?
As stated by Scholarpedia, "Reinforcement learning (RL) is defined as the process of learning by interacting with an environment. An RL agent learns from the consequences of its actions, rather than from being explicitly taught and it selects its actions on basis of its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning."
This machine learning technique enables the machines or computer programs to automatically understand the apt actions to be performed in a specific context. The efficiency of the RI involving machines improves based on the conversation situations it is exposed to or feedback (this is known as the reinforcement signal) it is directly fed with. These study the behavioral patterns and store these feedback or the context of conversations in the look-up table, which acts as a knowledge base for the reinforcement learning setup to decide on “what action to be performed in given context”.
Figure: Reinforcement Learning Process
The Reinforcement Learning Role in Transforming a Chatbot
As a part of enterprise chatbot training and development, we place the chatbot directly on actual interactions itself. Given a conversation, the agent analyzes how the conversation flow has happened between human agent/chatbot and probably the end user. It studies how the user has replied: If the user was satisfied the reply would be “thank you” or something like “that has helped”, these results it records as positive signals in the lookup. If the user is not satisfied, the user would probably may ask more questions or send negative signals. It grasps these signals, learns from the interaction points and stores back in lookup table.
It enables the lookup table enables the chatbot to identify its user preferences, problem areas and what are the different situations the user would be happy/sad/angry. It tries to identify behavior patterns and learn from the interactions. It analyzes not from one person, it learns from different persons, diverse interactions, and possible situations. Based on this analysis, policies are framed.
These policies will be used to think, what to do in a given situation based on history (e.g.: How customer would be satisfied). The goal is customer satisfaction, for that it collects as many customer experiences as possible and further reframe policies to ensure better customer satisfaction. This process is called Reinforcement learning. It can use type deep reinforcement learning to develop policies much better and accurate.
In some situations, the user may request some information which requires the chatbot to refer various business enterprise systems data (like ERP, CRM, databases), process it and generate an answer. In this sort of intelligent enterprise chatbot applications, chatbot can refer multi-document information, summarize it and generate appropriate answers.