IVR Customer Intent Recognition

IVR (Interactive Voice Response) Customer Service Intent Recognition refers to the application of AI voice processing technology that allows users to interact with customer service systems through voice commands, automatically recognizing customer intent to provide corresponding services. In Taiwan, such systems are widely used in banking, telecommunications, healthcare, and other sectors to improve service efficiency and customer experience.

Core Functions and Process

  1. Voice Input Reception After customers call in, the system plays an automated voice menu (e.g., "Please briefly state your needs, and we will provide assistance."). Customers don't need to press keys but can directly express their needs through voice.

  2. Speech Recognition (ASR) The system uses Automatic Speech Recognition technology to convert speech into text. For example, when a customer says "I want to check my bill," the system converts this content into text input.

  3. Natural Language Understanding (NLU) The system uses Natural Language Understanding technology to analyze semantics and determine the user's actual needs. For example:

    • Vocabulary analysis: Keywords like "check" and "bill" indicate billing-related needs.

    • Intent identification: Determining that the customer's purpose is "checking bills."

  4. Response and Routing The system provides corresponding services based on semantic analysis results, with options including:

    • Direct Response: If the request can be handled automatically, e.g., "Your bill amount is 1,200 dollars, due on December 15th."

    • Transfer to Agent: For complex requests, the system automatically transfers to customer service staff in the corresponding department, providing semantic summaries to reduce repetitive communication.

Technical Challenges and Limitations

  1. Difficulty in Understanding Diverse Expressions

    • Users' language expressions may not be standardized, e.g., "What's up with my bill?" or "What about payment issues?" Such semantic ambiguity sometimes makes BERT models unable to accurately determine specific needs.

    • In Taiwan, there's also the challenge of multiple languages and dialects (Mandarin, Taiwanese, Hakka), with language models having insufficient support for these language data.

  2. Intent Classification Limitations

    • While modern NLP models can process large amounts of text data, they cannot fully grasp certain industry-specific knowledge or special intents. For example: "I want to know the exact date of my last payment" might require connecting to different systems for a correct answer.

    • Although BERT performs well with short dialogue segments, long sentences or complex semantic expressions can confuse the model.

  3. Data Bias and Incomplete Language Corpus

    • Training language models requires extensive localized corpus. Insufficient data or bias towards single expression forms can lead to poor model adaptation to special contexts. For example, Taiwan-specific language habits like "stored value" or "skip number" may lack sufficient contextual corpus.

  4. Context and Memory Limitations

    • Customer conversations often have contextual relationships. For example, multi-turn dialogues like "About the payment I just mentioned, I have other questions" require the system to remember previous intents. Current NLP models have limited application performance in this aspect.

    • If intent determination fails, users may need to restate their needs, causing frustration.

  5. Low Error Tolerance

    • Customers have limited patience with customer service systems. If the voice system makes incorrect judgments, customers may feel frustrated and ultimately request to speak with a human.

Industry Status

Currently, many businesses' voice customer service systems still use traditional key-press selection processes. Taking SinoPac Bank's voice customer service system as an example, its design fully considers business diversity, providing multi-level menu options to guide users. However, there is still much room for optimization and improvement in user interaction experience.

The emergence of LLM (Large Language Models) and RAG (Retrieval-Augmented Generation) has brought revolutionary changes to semantic recognition and overall IVR systems, making voice customer service systems more intelligent, precise, and adaptive, overcoming many limitations of traditional NLP technology.

Solution

Below, we'll guide you step by step through creating a powerful intent recognition assistant using MaiAgent's robust and precise LLM and RAG capabilities.

Operation Steps

1. Data Preparation

Previously, BERT required large amounts of labeled data for NLP tasks (such as intent recognition, sentiment analysis, etc.) for training. Labeled data was typically completed manually, such as annotating sentences with intent categories or keywords, which was both time-consuming and expensive. Even with high-quality labeled data, model generalization ability was insufficient. When business requirements or usage habits changed, re-labeling and retraining models was required, resulting in longer cycles.

The advantage of LLM and RAG technology lies in their ability to fully combine generative language capabilities with dynamic retrieval, eliminating dependence on labeled data, improving semantic recognition accuracy, reducing development and maintenance costs, and greatly enhancing user experience. This technology combination sets new industry standards for intelligent customer service and voice interaction, serving as a key driver for future automated and personalized services.

The introduction of LLM and RAG has greatly simplified the labeled data preparation process. Now, you only need to organize data into an Excel spreadsheet, simply list intent classifications, and upload this Bank Customer Service List to the MaiAgent AI assistant's knowledge base to support the intelligent semantic recognition system's operation.

Bank Service List (for Intent Recognition)

2. Define Role Instructions

# Role
You are MaiAgent Bank's semantic understanding robot

# Output Format
Please understand the customer's service intent based on user dialogue, determine the user's desired service intent from the knowledge base and output

When the intent is clear, please output just one intent; if there are multiple similar intents that cannot be determined, please list up to 3 most similar intents

<example>
-<code>:<category> - <subcategory>
</example>

<example>
-<code>:<category> - <subcategory>
-<code>:<category> - <subcategory>
...
</example>

<example>
N/A
</example>

# Output Restrictions
- Please reply in Traditional Chinese
- Do not answer information not in the knowledge base
- Please directly output the text within <example> and </example>, do not include other descriptions
- Output does not include <example> and </example>
- Answer based on knowledge base data, if intent cannot be determined, please answer with text in <example> below

3. Start Using

Usage Examples

Single Dialogue

Intent Recognition Case

Multi-turn Dialogue

LLM and RAG technology solves the difficulty of recognizing intent in multi-turn dialogues on BERT.

Intent Recognition Multi-turn Dialogue Example

Last updated

Was this helpful?