Select Page

How to train a transactional chatbot using reinforcement learning?

Transactional chatbot

Listen to the article

What is Chainlink VRF

In an age where artificial intelligence is reshaping our world, chatbots have emerged as a valuable tool for businesses. With a staggering 80% of businesses projected to integrate chatbots in their operations by 2024, the focus is now shifting towards transactional chatbots, also known as Goal-oriented (GO) chatbots. Unlike typical chatbots, transactional chatbots are laser-focused on solving specific user problems. Need to book a ticket? There is a chatbot for that. Looking to make a reservation? A transactional chatbot is on it. These transactional chatbots are not just sophisticated, they are becoming smarter and more efficient by the day.

But how are these transactional chatbots trained to be so proficient? The answer lies in two major learning techniques: supervised learning and reinforcement learning. Supervised learning uses an encoder-decoder approach to map user dialogue to responses directly. In contrast, reinforcement learning takes a more hands-on approach, training chatbots using trial-and-error conversations with rule-based user simulator or real users.

Among these, transactional chatbots using reinforcement learning have recently surfaced as an exciting field teeming with potential applications. One stellar example of this rapidly growing field is the TC-Bot developed by MiuLab. The TC-Bot showcases how a user can be simulated using basic rules, significantly expediting the training process compared to using real people.

With more advanced chatbot training methods being developed, it’s safe to say we are on the cusp of a new era where transactional chatbots will become ubiquitous, changing the way we interact with technology. In this article, we will dive deep into the world of transactional chatbots, explore the process of their training, their use cases and other vital aspects.

What is transactional chatbot?

A transactional chatbot, also known as a task-oriented or goal-oriented chatbot, is a specialized form of artificial intelligence software designed with a clear purpose – to help users achieve a specific goal or complete a specific task. This could range from booking a flight, scheduling a doctor’s appointment, or placing an order for a pizza.

Unlike their counterparts (general conversation or social chatbots), which focus on simulating human-like interaction and carrying out broad, non-specific conversations, transactional chatbots have a clear focus. Their role is not to engage in small talk or provide entertainment but to aid users in accomplishing a particular task as quickly and efficiently as possible.

Transactional chatbots operate by recognizing and understanding the user’s intent and then taking appropriate actions to fulfill the user’s request. To do this, they employ sophisticated Natural Language Understanding (NLU) capabilities and machine learning algorithms to interpret the user’s inputs, map them to the correct action, and generate a suitable response.

The importance of these goal-oriented chatbots in today’s digital ecosystem cannot be understated. In a world that is increasingly driven by speed, efficiency, and convenience, transactional chatbots serve as a pivotal touchpoint between businesses and customers. They provide instant, 24/7 support, helping to improve customer service and engagement, streamline business processes, and reduce operational costs. Moreover, they provide a personalized user experience, understand and remember customer preferences, and deliver tailor-made solutions, enhancing customer satisfaction and loyalty.

Furthermore, in times of social distancing and remote operations, transactional chatbots have become invaluable tools for businesses to maintain constant, uninterrupted customer support. By handling routine tasks and queries, they allow human staff to focus on more complex and critical issues, thus enhancing the overall efficiency of the business.

In sum, transactional chatbots are more than just fancy technology; they are powerful tools that are reshaping the way businesses operate and interact with their customers, making them indispensable in the modern digital landscape.

Transactional chatbots vs. traditional chatbots


Comparison Criteria Transactional Chatbot Traditional Chatbot
Purpose Primarily designed to handle transactions and support complex tasks. They can assist in making reservations, completing purchases, and providing personalized recommendations. Typically designed for simple tasks such as answering basic FAQs or guiding users to the appropriate resources.
Complexity of interaction Capable of understanding and responding to more complex customer queries. These chatbots can process multiple layers of communication and follow the flow of conversation. Generally capable of managing simple, linear conversations and might struggle with complex interactions.
Use of AI Uses advanced AI and machine learning to provide personalized responses, understand user intent, and remember previous interactions. Primarily uses rule-based responses and may or may not leverage AI. Its capabilities are often limited to predefined responses.
Data analysis Continually learns from user interactions, enabling it to make more accurate predictions and provide personalized services. Data analysis is typically minimal or non-existent, with less emphasis on learning from user interactions.
User experience Enhances user experience by offering personalized responses and handling complex requests. Provides a satisfactory user experience for straightforward inquiries but may not handle complex requests as effectively.
Integration with other systems Often integrated with other systems (CRM, ERP) to access customer data, process transactions, etc. Usually standalone, with minimal integration with other systems.
Cost and implementation time Might require a higher initial investment and longer implementation time due to their complex nature. Generally cheaper and quicker to implement as they’re less complex.
Scalability High scalability due to its ability to learn and adapt from interactions. Can handle an increasing number of complex queries effectively. Limited scalability. As queries become more complex, these chatbots might struggle to maintain efficiency.

Build robust chatbots with LeewayHertz!

Unlock the power of conversational AI with our robust chatbots. Whether you want a transactional chatbot, a customer support chatbot or a health & wellness chatbot, we have you covered.

Key components of transactional chatbots

Goal-oriented chatbots or transactional chatbots, also known as task-oriented chatbots, have several key components that enable them to interact with users effectively and accomplish specific tasks. Here are some of the main elements:

  • Natural Language Understanding (NLU) unit: This is the component of the chatbot that interprets and understands the user’s input. It transforms human language into a machine-readable format. NLU employs tokenization, stemming, part-of-speech tagging, and entity extraction to understand the user’s message’s context, intent, and entities.
  • Dialogue Manager (DM): The DM is the central control unit of the chatbot. It maintains the context and state of the conversation, decides the next action based on the current state and user’s input, and generates the appropriate system response.
  • State Tracker (ST): Sometimes considered a part of the Dialogue Manager, the state tracker keeps track of the current state of the conversation, including the user’s goals, requests, and the information that the chatbot has provided.
    Policy learner: This component uses reinforcement learning algorithms to determine the best responses based on the state of the conversation. It “learns” from its past actions and their outcomes to optimize the chatbot’s responses.
  • Natural Language Generator (NLG) unit: The NLG takes the system response generated by the dialogue manager and translates it into natural, human-like language. This can either be a simple template-based system or a more complex machine learning model.
    User simulator: In training a transactional chatbot, a user simulator is used. It’s a model that generates simulated user behavior, which can be used for training the chatbot in a controlled environment.
  • Database (DB): Chatbots that provide information or perform transactions often need to interact with a database. This could be checking ticket availability, booking appointments, providing product details, etc. The DB is an integral part of these chatbot systems.
  • Error model controller: This component is often used during training to add some noise to the user simulator’s responses, making the training environment more similar to real-world conditions where user inputs can be unpredictable and varied.

These components work together in a cycle to enable transactional chatbots to handle complex, multi-turn dialogues, manage user goals, and offer an engaging, human-like conversation experience.

Benefits of transactional chatbots

Transactional chatbots, a form of virtual assistant, are seeing increased adoption across various industries, all thanks to the multitude of benefits they bring to the table. Here are some benefits of using them:

  • Enhanced efficiency: Transactional chatbots are designed for multitasking, handling several customer interactions simultaneously without any hitches. They provide round-the-clock service, responding to customer queries in real time, regardless of geographical boundaries or time differences. Automated responses also guarantee accuracy, improving the overall efficiency of your team and services.
  • Budget-friendly solution: Incorporating transactional chatbots into your customer service protocol allows you to minimize the need for human intervention, leading to considerable cost savings. With their capacity to operate 24/7, chatbots also contribute to improved cost-effectiveness. By optimizing operations and reducing personnel expenses, chatbots offer substantial cost advantages.
  • Tailored interactions: Chatbots can comprehend each customer’s preferences, paving the way for more personalized interactions and tailored recommendations. Customers are more likely to interact with businesses offering a personal touch, enhancing their overall experience.
  • Augmented sales: Transactional chatbots can significantly boost sales by providing personalized suggestions based on customer preferences and buying history. They also contribute to lead generation by simultaneously managing multiple queries, potentially enhancing your business’s revenue and sales figures.
  • Superior customer experience: With their round-the-clock service and efficient customer management, transactional chatbots significantly improve the customer experience. By offering seamless service without human involvement, these chatbots can contribute to the growth and reputation of your organization.

How does a transactional chatbot operate?

Here is the sequence of steps that describe how a transactional chatbot works.

  • User initiation: The process begins when a user sends a message or a request to the chatbot. This could be a query, a request for information, or an action such as booking a ticket or making a reservation.
  • Input interpretation: The chatbot uses its Natural Language Understanding (NLU) unit to interpret the user’s message. It converts the natural language input into a machine-readable format. The NLU unit employs tokenization, stemming, part-of-speech tagging, and entity extraction to understand the context, intent, and entities in the user’s message.
  • Dialogue management: The Dialogue Manager (DM) processes this interpreted input. It uses the state tracker to keep track of the conversation’s context, including the user’s goals, requests, and the information the chatbot has provided.
  • Policy learning: Based on the current state of the conversation, the policy learner uses reinforcement learning algorithms to decide on the best possible action or response.
  • System response generation: Once the action is determined, the system generates an appropriate response. This could involve querying a database for required information, initiating a transaction, or formulating a reply to the user’s query.
  • Response delivery: The generated system response is then translated into natural, human-like language using the Natural Language Generator (NLG) unit. This response is then delivered to the user.
  • User feedback and learning: The chatbot observes and learns from user feedback. For instance, if a user corrects information or rephrases a request, the chatbot uses this feedback to update its understanding and improve future responses.
  • Conversation continuation or termination: Depending on the user’s response or the chatbot’s settings, the conversation may continue with further exchanges or be concluded if the chatbot has successfully addressed the user’s request.

This is a generalized flow of how a transactional chatbot operates. Please note that the exact workings can vary based on the chatbot’s specific design, functionalities, and the complexity of tasks it is programmed to perform.

Understanding the dialogue system

Understanding the dialogue system

A transactional chatbot employs a dialogue system designed to facilitate meaningful, purpose-driven conversations with users. This system revolves around three key components: the Dialogue Manager (DM), the Natural Language Understanding (NLU) unit, and the Natural Language Generator (NLG) unit, each playing a unique role in the conversational process.

The NLU unit acts as the ears of the chatbot, listening to and interpreting user inputs. When a user utters something, it is the job of the NLU to translate this into a semantic frame. This frame is a structured representation of the user’s utterance, stripped of natural language complexities and brought down to a format the chatbot can understand and process.

Now enter the DM, the chatbot’s brain. Composed of a Dialogue State Tracker (DST) and a policy, often represented by a neural network, the DM controls the flow of the conversation. The DST takes the semantic frame from the NLU, combines it with the history of the conversation, and creates a state representation. This state is the distilled essence of the dialogue so far, allowing the bot to maintain the context and continuity of the conversation.

Next, the state representation is ingested by the policy component of the DM, determining the chatbot’s next action. Here, reinforcement learning can play a vital role, enabling the chatbot to learn the best responses over time from repeated interactions.

In some cases, an external database can be consulted to supplement the chatbot’s responses with useful information, like specifics about a restaurant reservation or movie ticket availability.

Once the chatbot’s response is decided, it is still in a semantic frame, which isn’t user-friendly. Here is where the NLG unit, the chatbot’s mouth, steps in. The NLG takes this semantic frame and transforms it back into natural, human-like language. This allows the chatbot to deliver responses that are easily understandable by the user.

The user’s goal, be it making a reservation, booking a ticket, or gathering information, forms the driving force behind this dialogue loop. Through iterative cycles of understanding, managing dialogue, and generating natural language, the transactional chatbot works towards achieving this user goal, creating a dynamic, interactive, and purposeful conversational experience.

The role of the user simulator and error model controller

In transactional chatbots, two significant components contribute to refining the model’s training and performance: the user simulator and the Error Model Controller (EMC). Both are crucial in enabling the chatbot to handle more realistic, diverse, and error-prone conversations.

Build robust chatbots with LeewayHertz!

Unlock the power of conversational AI with our robust chatbots. Whether you want a transactional chatbot, a customer support chatbot or a health & wellness chatbot, we have you covered.

User simulator

The user simulator is akin to a virtual training partner for the chatbot. It emulates the behavior of a real user, offering a more efficient way to train the bot compared to hours of user interactions. This simulator operates based on an agenda, meaning it has a predefined goal for each interaction episode, and its actions align with this goal. The internal state of the simulator allows it to follow the dialogue progression and take informed actions accordingly. Responses to agent actions are crafted using a combination of deterministic rules with a touch of stochastic rules to introduce variety.

User goals are essential elements for the simulator, representing what the user wants to achieve from a conversation. These goals can be sourced from actual dialogue corpus or be manually created, comprising ‘inform slots’ and ‘request slots.’ The inform slots represent constraints the user has in mind, while request slots simulate the user’s quest for specific information. However, unlike real users who may change their minds during a conversation, the simulator’s goals remain static throughout an episode. A “default slot” is added to every goal’s request slots, and the agent must provide a value for this slot for successful goal fulfillment.

The user simulator’s internal state records the goal slots and the conversation’s history. It aids in formulating user actions at each step, containing dictionaries of slots and an intent: rest slots, history slots, request slots, inform slots, and the intent of the current action.

The actions that a user simulator can perform are varied and can sometimes be complex, incorporating multiple requests or inform slots. These actions can even contain a mix of both types of slots.

Error Model Controller (EMC)

The Error Model Controller (EMC) comes into play once a user action is received from the simulator. It is responsible for introducing errors into these actions, mimicking the imperfections of real-world interactions and helping the bot cope with potential misunderstandings or mistakes in user responses. The EMC can add errors to the user action’s inform slots and intent, training the bot to handle unexpected scenarios better and ensuring it’s equipped to deal with more realistic, less-than-perfect human interactions.

An overview of Deep-Q-Network

Deep Q-Network (DQN) is a reinforcement learning technique that combines Q-Learning with deep neural networks. DQN was proposed by researchers at Google DeepMind and it had a significant impact on the field of reinforcement learning, particularly in environments where input data has high-dimensional raw spaces, such as video games.

In traditional Q-Learning, a table called the Q-table stores the value of every possible state-action pair. However, this approach doesn’t scale well to problems with large state spaces or problems where states are not easily expressible in table form, such as image inputs.

DQN addresses these challenges using a deep neural network to approximate the Q-function, which maps state-action pairs to expected future rewards. This way, a neural network can be trained to predict the Q-values for a given state instead of maintaining a table for each possible state-action pair.

A key innovation in DQN is using experience replay and target networks to stabilize training. Experience replay stores past experiences in a replay buffer and samples mini-batches from this buffer to train the network, which breaks the correlation between sequential experiences. The target network is a separate network used to compute the target Q-values during learning, which is periodically updated from the main network. This helps to avoid harmful feedback loops during learning.

Since the inception of DQN, many extensions have been proposed to improve its performance and stability, such as Double DQN, Dueling DQN, and Prioritized Experience Replay.

Training a transactional chatbot using Deep-Q-Network

Building a transactional chatbot using reinforcement learning involves several steps that should be executed sequentially. Here’s the sequence:

  1. Preparing the state: The initial step in developing a chatbot is preparing the state, which represents the current situation that the chatbot is in. This typically involves processing the raw input data (like text conversation history) into a format the model can understand. The state also includes the chatbot’s internal information about the conversation, like the identified intents or entities in the user’s utterances.
  2. Dialogue configuration for the agent: The next step is to set up the dialogue configuration for the agent. This includes defining the possible actions that the agent can take (like answering a question, asking for more information, or ending the conversation) and defining the reward structure that the agent will use to learn. This configuration guides the agent about the context of the conversation, its possible actions, and their consequences.
  3. Neural network model: Once the state and dialogue configuration have been set up, the next step is to build the neural network model that will be used to learn the dialogue policy. This model takes the current state as input and outputs the Q-values for each possible action. The Q-values represent the expected future reward for taking each action, which is used to decide the best action to take. This model could be a Deep Q-Network (DQN) or other types of network, depending on the complexity of the task and the available data.
  4. Policy: With the neural network model in place, a policy that dictates how the agent chooses its actions can be defined. A common policy is an epsilon-greedy policy, where the agent mostly chooses the action with the highest Q-value (as predicted by the model) but occasionally chooses a random action to explore the environment.
  5. Agent training: Finally, with the state, dialogue configuration, neural network model, and policy setup, the agent can be trained. During training, the agent interacts with the environment (in this case, the chatbot conversing with users or a user simulator), takes actions according to its policy, observes the results, and receives rewards. The agent then uses these experiences to update its neural network model, intending to maximize its total reward over time. The agent continually goes through this interaction and learning process until it reaches a satisfactory performance level.

The scenario

The main objective of our transactional chatbot is to engage in proficient interactions with real users, successfully accomplishing specific tasks such as locating suitable reservations or movie tickets within the users’ specified constraints. The chatbot, referred to as the agent, has a crucial role in processing an ongoing conversation’s state and generating an appropriate, near-optimal response. In essence, the agent takes a snapshot of the current dialogue history from the Dialogue State Tracker (ST) and uses it to decide on the most fitting dialogue response to offer the next.

The supporting code for our system draws inspiration from a dialogue system developed by MiuLab, known as TC-Bot. The notable achievement of their research is the demonstration of a user simulation with fundamental rules. This approach enables the swift training of the chatbot agent via reinforcement learning, which is considerably faster than when training with real people. While other studies have attempted similar methods, the unique aspect of this research lies in its effective training model, which is successful and accompanied by accessible and comprehensive code.

The complete code is available here –


To fully comprehend the code, there are a few prerequisites that won’t be explicitly covered but are vital for a comprehensive understanding. Here they are:

  • Proficiency in Python programming – A solid grasp of Python programming language is a must.
  • Mastery of Python dictionaries – We will extensively utilize dictionaries in Python, so understanding their operation is crucial.
  • Understanding of the DQN (Deep Q-Network) – Familiarity with developing a simple DQN is necessary.
  • Experience with Keras for building neural networks – You should know how to construct a straightforward neural network model using Keras.

Please ensure you are familiar with these areas before proceeding.

You need to have the following dependencies ready before executing the code:

  • Python >= 3.5
  • Keras >= 2.24 (Earlier versions probably work)
  • numpy

Understanding the data (movie tickets) for the chatbot

The core objective here is to enable the chatbot agent to locate a ticket that aligns with the user’s specific requirements, which are defined by the goal for each episode. This is quite a challenging task considering each ticket’s uniqueness and variance in slots!

Understanding the anatomy of an action

Understanding the structure of an action is crucial in this dialogue system. Ignoring the natural language aspect for a moment, we can see that both the user simulator and the agent work with actions represented as semantic frames. An action consists of an intent, inform slots, and request slots. Here, a ‘slot’ signifies a key-value pair, typically referring to a singular inform or request. For instance, in the dictionary {‘starttime’: ’tonight’, ‘theater’: ’regal 16’}, both ‘starttime: tonight’ and ‘theater: regal 16’ are considered slots. Here you will get more example actions:

The intent indicates the kind of action it is. The remainder of the action is divided into inform slots, which contain constraints, and request slots, which carry information that needs completion. The potential keys are specified in the, and their values are provided in the aforementioned database dictionary.

An inform slot shares information that the sender wants the receiver to acknowledge. It comprises a key from the list of keys and a value from that key’s associated list of values. Conversely, a request slot contains a key for which the sender wishes to retrieve a value from the receiver. In essence, it is a key from the list of keys and ‘UNK’ (indicating “unknown”) as the value, as the sender doesn’t yet know the appropriate value for this slot.

The intents Include:

  • Inform: Provides constraints in the form of inform slots.
  • Request: Asks for the completion of request slots with values.
  • Thanks: Used exclusively by the user, it signals to the agent that it has done something satisfactory, or that the user is prepared to conclude the conversation.
  • Match found: Used solely by the agent, it informs the user that a match fulfilling the user’s goal has been identified.
  • Reject: Utilized only by the user in response to the agent’s ‘match found’ intent, indicating that the suggested match doesn’t fit their constraints.
  • Done: The agent uses this to wrap up the conversation and verify if the current goal has been accomplished. The user action automatically adopts this intent if the conversation drags on too long.

Preparing the state

The Dialogue State Tracker (ST) is essential in a transactional chatbot. Its primary function is to create a ‘state’ for the chatbot to work from. A ‘state’ is like a snapshot of the current situation in the chat, which the chatbot uses to decide its next action.

To do this, the ST maintains a record of the dialogue, capturing both the user’s and chatbot’s actions as they happen. It also keeps track of any information (known as ‘inform slots’) shared in the chat. For instance, if the user mentions they prefer Italian food, this information is saved in an ‘inform slot.’

The state prepared by the ST is essentially an array of data representing current dialogue history and all the information slots mentioned so far. It’s like a conversation summary to date, which helps the chatbot make informed decisions.

Also, whenever the chatbot needs to provide information to the user, the ST can fetch this from a database using the data in the current information. For example, if the user asks for Italian restaurants, the ST can pull a list from the database matching this criterion.

One crucial aspect of the ST’s job is to compile a useful state that gives the chatbot an accurate view of the ongoing conversation. This state includes recent actions from both the user and the chatbot, letting the chatbot know where the dialogue is at. It also includes a count of the number of rounds or interactions that have occurred. This helps the chatbot gauge how much time it has left, especially in scenarios where the chat has a maximum number of rounds allowed.

Lastly, the state also includes details about the current inform slots and how many database entries match this information. This helps the chatbot know how much information it has to work with and how relevant it is to the user’s requirements.

The Dialogue State Tracker is like the chatbot’s memory and awareness, helping it understand the current conversation and make the best possible response.

Dialogue configuration for the agent

Dialogue configuration for the agent is a critical step in building a transactional chatbot. This process involves defining how the chatbot will interact with users, specifying the flow of conversation, and the range of responses it can deliver. Essentially, it is setting up the rules of engagement for the chatbot, ensuring that it can understand user inputs and provide relevant and meaningful responses. This configuration becomes the foundation upon which further layers of learning and adaptation are built, making it a vital part of any successful chatbot development.

Here are the dialogue config constants used by the agent:

# Possible inform and request slots for the agentagent_inform_slots = ['moviename', 'theater', 'starttime', 'date', 'genre', 'state', 'city', 'zip', 'critic_rating',                     'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price', 'actor',                     'description', 'other', 'numberofkids']agent_request_slots = ['moviename', 'theater', 'starttime', 'date', 'numberofpeople', 'genre', 'state', 'city', 'zip',                      'critic_rating', 'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price',                      'actor', 'description', 'other', 'numberofkids'] # Possible actions for agentagent_actions = [   {'intent': 'done', 'inform_slots': {}, 'request_slots': {}},  # Triggers closing of conversation   {'intent': 'match_found', 'inform_slots': {}, 'request_slots': {}}]for slot in agent_inform_slots:   agent_actions.append({'intent': 'inform', 'inform_slots': {slot: 'PLACEHOLDER'}, 'request_slots': {}})for slot in agent_request_slots:   agent_actions.append({'intent': 'request', 'inform_slots': {}, 'request_slots': {slot: 'UNK'}}) # Rule-based policy request listrule_requests = ['moviename', 'starttime', 'city', 'date', 'theater', 'numberofpeople']# These are possible inform slot keys that cannot be used to queryno_query_keys = ['numberofpeople', usersim_default_key]

Building a neural network model

In the development of a transactional chatbot, constructing the neural network model is a pivotal step. Leveraging Keras, a popular deep learning framework, a model for the chatbot agent is designed. This model comprises a single hidden layer neural network, which, despite its simplicity, proves to be highly effective for the task at hand. The design of this model plays a crucial role in enabling the chatbot to comprehend and respond appropriately to the user’s input. Here is the code snippet:

def _build_model(self):   model = Sequential()   model.add(Dense(self.hidden_size, input_dim=self.state_size, activation='relu'))   model.add(Dense(self.num_actions, activation='linear'))   model.compile(loss='mse', optimizer=Adam(   return model

The instance variables are assigned in constants.json file located here –

Implementing policy

The implementation of the policy in a transactional chatbot serves as a guide for the agent to select a suitable action based on the current state. This varies according to whether the dialogue is in the warm-up or training stage. The warm-up stage, which precedes the training, is designed to fill the agent’s memory using generally a random policy. For our GO chatbot, however, a basic rule-based policy is used during the warm-up phase.

def get_action(self, state, use_rule=False):   # self.eps is initialized to the starting epsilon and does NOT get annealed   if self.eps > random.random():       index = random.randint(0, self.num_actions - 1)       # self._map_index_to_action(index) takes an index and maps the action from all possible agent actions       action = self._map_index_to_action(index)       return index, action   else:       if use_rule:           return self._rule_action()       else:           return self._dqn_action(state)

Upon transitioning into the training stage, the behavior model comes into play for action selection. Here, the term ‘use rule’ signifies the warm-up stage. This policy determination method provides both the index of the action and the action itself.

The rule-based policy employed during the warm-up stage is a straightforward one. A noteworthy component of this rule-based policy is the reset method of the agent. This primarily serves to reset a couple of variables associated with the rule-based policy. Although simple, this policy is crucial for initiating the agent’s activity in a somewhat meaningful way, thus improving results over taking random actions.

Build robust chatbots with LeewayHertz!

Unlock the power of conversational AI with our robust chatbots. Whether you want a transactional chatbot, a customer support chatbot or a health & wellness chatbot, we have you covered.

Training an agent

Training an agent

In a transactional chatbot, the agent’s role is much like a skilled conversation partner, adept at helping users achieve a specific target, such as booking a reservation or buying a movie ticket, while considering the user’s specific needs and limitations. This agent’s primary task is navigating through a conversation and making the best possible decision at each step.

The agent relies on a Dialogue State Tracker (ST) to do this. This tracker is like the memory of the conversation, keeping track of the discussion’s history. Using this information, the agent selects an appropriate response that moves the conversation forward, aiming to fulfill the user’s goal.

The agent chooses a course of action based on a specific state. During the warm-up phase, this policy could be as simple as a list of requests. However, during training, the policy becomes more complex, transforming into a single-layer behavior model.

The training method is pretty straightforward, with only a few variations from other methods that use Deep Q-Network (DQN) training. It is always beneficial to experiment with the model’s structure, incorporate prioritized experience replay (a technique that selectively replays more important experiences), and develop a more sophisticated rule-based policy. This continual tweaking and enhancement can make the agent even more efficient and effective at accomplishing its goals.

Here’s a simpler explanation of the flow of an agent’s action in a transactional chatbot, as shown in the above diagram:

A single round or loop in training involves four main components:

  • The agent (dqn_agent)
  • The dialogue state tracker (state_tracker)
  • The user (or user simulator)
  • The Error Model Controller (EMC)

The following steps outline the sequence of events:

  1. The round begins by acquiring the current state, either an initial state for the start of the conversation (episode) or equivalent to the previous round. This state is then fed into the agent’s action determination method.
  2. The agent decides on an action based on the current state and passes it to the state tracker. The state tracker updates its record of the conversation and enriches the agent’s action with additional information retrieved from a database.
  3. The enriched agent’s action is then given to the user simulator. Here, the user simulator generates a rule-based response and also provides details about the reward and success rate (though these aren’t shown in the diagram).
  4. The user’s response then goes through the error model controller, which introduces potential errors mimicking real-world scenarios.
  5. The possibly erroneous user response is then fed into the state tracker, which updates its conversation record. However, unlike before, it doesn’t add any substantial updates to the user response.
  6. Lastly, the state tracker produces the next stage of the conversation, completing the current experience tuple (state, action, reward, next state). This tuple is then added to the agent’s memory, and the cycle continues with the next round.

Before the actual learning and decision-making begin for a Deep Q-Network (DQN) agent, like our chatbot, it undergoes a ‘warm-up’ phase. This phase is necessary to fill the agent’s memory buffer with initial information. But, unlike DQN applications in games where the agent may perform random actions, our chatbot uses a basic rule-based algorithm during this warm-up stage. The specifics of this algorithm will be covered in detail in part II of the series.

It’s also important to note that we are not using any Natural Language (NL) components in this training process. This means that all the actions of the chatbot will be in the form of ‘semantic frames’ – structured data representing meanings. The focus here is on training the Dialogue Manager (DM), which doesn’t require Natural Language Understanding (NLU) or Natural Language Generation (NLG). These NL components are usually pre-trained separately from the agent and are not crucial to understand the reinforcement learning process.

Here is the code snippet to train the agent:

print('Training Started...')
   episode = 0
   period_reward_total = 0
   period_success_total = 0
   success_rate_best = 0.0
   while episode < NUM_EP_TRAIN:        episode_reset()        episode += 1        done = False        state = state_tracker.get_state()        while not done:            next_state, reward, done, success = run_round(state)            period_reward_total += reward            state = next_state          period_success_total += success          # Train        if episode % TRAIN_FREQ == 0:            # Check success rate            success_rate = period_success_total / TRAIN_FREQ            avg_reward = period_reward_total / TRAIN_FREQ            # Flush            if success_rate >= success_rate_best and success_rate >= SUCCESS_RATE_THRESHOLD:
           # Update current best success rate
           if success_rate > success_rate_best:
               print('Episode: {} NEW BEST SUCCESS RATE: {} Avg Reward: {}' .format(episode, success_rate, avg_reward))
               success_rate_best = success_rate
           period_success_total = 0
           period_reward_total = 0
           # Copy
           # Train
   print('...Training Ended')

The complete code is available here –

Use cases of transactional chatbots

Transactional chatbots hold great potential across a multitude of sectors, including but not limited to banking, insurance, e-commerce, healthcare, and hospitality. Here is how they can be leveraged in various contexts:

  • Banking: Transactional chatbots can enhance banking services by automating tasks traditionally handled by bank operators. For instance, they can authenticate user identities, block stolen credit cards, provide operational hours of nearby branches, or confirm outgoing transfers. Moreover, they can offer immediate assistance in case of account queries, balance checks, or recent transactions, providing users with real-time convenience.
  • Insurance: In the insurance sector, these chatbots can offer quotes to potential customers or distribute insurance certificates to existing ones. More advanced bots can even streamline the conversion process, allowing prospects to sign up if the quote matches their budget and needs directly. The bot gathers necessary details and forwards the contract and supporting documents, reducing manual intervention and accelerating policy issuance.
  • E-commerce: For e-commerce platforms, transactional chatbots can assist users in product discovery based on their preferences. Additionally, they can facilitate the buying process and handle requests for order modifications or cancellations. These bots can also provide real-time order tracking, enhancing the shopping experience.
  • Healthcare: Transactional chatbots in the healthcare industry can help patients book appointments, send reminders for medication, or guide common health issues. They can also gather patient data for health records, making the patient intake process more efficient.
  • Hospitality: In the hospitality sector, these bots can automate room bookings, provide information about facilities, offer personalized recommendations, and address common queries about the stay, check-in/check-out process, etc.
  • Energy companies or mobile service providers: Similar to insurance, these businesses can use transactional chatbots to provide quotes, facilitate service sign-ups, offer upgrades, or handle cancellation requests.

These few instances illustrate the versatility and utility of transactional chatbots. However, their use is not confined to these areas, and they can be tailored to address the unique needs of various other industries.


Transactional chatbots have indeed ushered in a new era of interaction between businesses and their customers. It has become vital for companies to incorporate this transformative technology into their communication strategies, ensuring they remain adaptable and responsive to the shifting needs of their clientele. The promise that transactional chatbots hold for the future is substantial, and with careful planning and tactical execution, they can contribute to substantial growth for any business. Therefore, if a company wishes to stay competitive and not fall behind in the rapidly advancing digital world, integrating a transactional chatbot into its strategic planning becomes an astute decision.

As consumer expectations continue to evolve, the prospects for transactional chatbots are looking brighter than ever. Future developments may involve more advanced levels of personalization, with chatbots becoming increasingly intelligent. This would offer a more enriched user experience, potentially featuring responses or suggestions specifically tailored to an individual user’s preferences or past interactions.

Security is another area poised for significant improvement, particularly given the sensitive transactional information these chatbots handle. Expect to see advancements in encryption, fraud detection, and even biometric authentication as a means to protect and secure user data.

Another promising direction for chatbots is their increasing integration with other sophisticated technologies. Currently, chatbots are deployed across a wide array of business sectors. Still, in the future, we could see them amalgamating with other cutting-edge technologies, such as voice assistants or augmented reality, to offer even more engaging customer experiences.

In sum, transactional chatbots are fast becoming necessary for businesses wishing to thrive and grow in the digital age. Their potential future developments point to a world of more personalized, secure, and immersive customer experiences.

Looking to boost your business operations with AI-driven transactional chatbots? Achieve this with LeewayHertz’s AI chatbot development expertise!

Listen to the article

What is Chainlink VRF

Author’s Bio


Akash Takyar

Akash Takyar LinkedIn
CEO LeewayHertz
Akash Takyar is the founder and CEO of LeewayHertz. With a proven track record of conceptualizing and architecting 100+ user-centric and scalable solutions for startups and enterprises, he brings a deep understanding of both technical and user experience aspects.
Akash's ability to build enterprise-grade technology solutions has garnered the trust of over 30 Fortune 500 companies, including Siemens, 3M, P&G, and Hershey's. Akash is an early adopter of new technology, a passionate technology enthusiast, and an investor in AI and IoT startups.

Related Services

AI Chatbot Development

Elevate your business with our advanced AI chatbots. Designed for adaptability, our solutions offer unparalleled support in task automation and customer engagement.

Explore Service

Start a conversation by filling the form

Once you let us know your requirement, our technical expert will schedule a call and discuss your idea in detail post sign of an NDA.
All information will be kept confidential.


Follow Us