IVR customer intent recognition

IVR (Interactive Voice Response) customer intent recognitionRefers to the application of AI voice processing technology that allows users to interact with customer service systems via voice commands and automatically recognizes customer intent to provide corresponding services. In Taiwan, such systems have been widely used in banking, telecommunications, healthcare, and other fields to improve service efficiency and customer experience.

Core functions and process

Receive voice input After the customer calls, the system plays an automated voice menu (for example: "Please briefly state your request, and we will assist you."). The customer does not need to press keys; they express their request directly by voice.
Speech recognition (ASR) The system uses Automatic Speech Recognition technology to convert speech into text. For example, if the customer says "I want to check my bill," the system converts that content into text input.
Semantic understanding (NLU) The system uses Natural Language Understanding technology to analyze semantics and determine the user's actual need. For example:
- Lexical analysis: keywords like "check" and "bill" indicate the request is related to billing.
- Intent identification: determine the customer's purpose is to "check the bill."
Response and routing Based on the semantic analysis results, the system provides corresponding services, which may include the following options:
- Direct response: If the request can be handled automatically, for example, "Your bill amount is 1200 NTD, and the payment deadline is December 15."
- Transfer to agent: If the request is more complex, the system automatically transfers to the appropriate department's customer service representative and simultaneously provides a semantic summary to reduce repetitive communication.

Technical challenges and limitations

Difficulty understanding diverse expressions
- Users' language expressions may be nonstandard, for example: "What’s going on with my bill?" or "What should I do about payment issues?" These meanings can be vague, and models like BERT sometimescannot accurately determine the specific need.
- In Taiwan, there is also the challenge of mixed use of multiple languages and dialects (Mandarin, Taiwanese, Hakka), and language models still lack sufficient support for these language data.
Limitations in intent granularity
- Although modern NLP models can handle large amounts of text data, they cannot fully grasp certain industry-specific knowledge or special intents. For example: "I want to know the exact date of my last payment" may require connecting to different systems to answer correctly.
- Even though BERT performs well on shorter dialogue segments, long sentences or complex semantic expressions can cause the model to become confused.
Data bias and incomplete corpora
- Training language modelsrequires large amounts of localized corpora, if data is insufficient or biased toward a single form of expression, the model's ability to adapt to special contexts will be inadequate. For example, Taiwan-specific language habits such as "top-up" or "skip number" may lack enough contextual corpus for the model.
Context and memory limitations
- Customer conversations usually have contextual relevance. For example, multi-turn dialogues like "About the payment you mentioned just now, I have other questions" require the system to remember previous intents. Current NLP models have limited performance in this area.
- If intent determination is wrong, users may have to restate their request, causing frustration.
Low error tolerance
- Customers have limited patience for customer service systems; if the voice system determines incorrectly, customers may feel frustrated and ultimately request to speak with a human.

Industry status

Currently, many companies still use traditional key-press menu processes for voice customer service systems. For example, Sinopac Bank's voice customer service system as an example, its design fully considers business diversity and provides multi-level menu options to guide users. However, there is still much room for optimization and improvement in the user interaction experience.

The emergence of LLM (Large Language Models) and RAG (Retrieval-Augmented Generation) has brought revolutionary changes to semantic recognition and overall IVR systems, making voice customer service systems smarter, more accurate, and more adaptable, overcoming many limitations of traditional NLP technologies.

永豐銀行 Bank SinoPac - 客服中心 – 連絡客服 – 語音服務

Solutions

Below, using MaiAgent's powerful and precise LLM and RAG capabilities, we will guide everyone step by step to create an extremely capable semantic recognition assistant.

Operation steps

1. Data preparation

In the past, BERT required a large amount of labeled data for NLP tasks (such as intent recognition, sentiment analysis, etc.) to train. Labeled data is usually created manually, for example by tagging sentences with intent categories or keywords, which is time-consuming and expensive. Even with high-quality labeled data, the model's generalization ability may be insufficient. When business needs or usage habits change, re-labeling and re-training the model is required, which takes a long cycle.

The advantage of LLM and RAG is that they can fully combine generative language capabilities with the dynamism of real-time retrieval, freeing themselves from dependence on labeled data, improving semantic recognition accuracy, reducing development and maintenance costs, and greatly improving user experience. This combination of technologies sets a new industry standard for intelligent customer service and voice interaction and is a key driver for future automation and personalized services.

Introducing LLM and RAG greatly simplifies the labeled data preparation process. Now, you only need to organize the data into an Excel sheet, simply list intent categories, and upload this bank customer service list to MaiAgent AI assistant's knowledge base to support the operation of an intelligent semantic recognition system.

2. Define role instructions

# Role
You are MaiAgent Bank's semantic understanding robot

# Output format
Please, based on the user's dialogue, understand the service intent the customer wants, determine the user's desired service intent from the knowledge base and output

When the intent is clear, please output only one intent; if multiple similar intents cannot be distinguished, please list up to 3 closest intents

<example>
-<code>:<category> - <subcategory>
</example>

<example>
-<code>:<category> - <subcategory>
-<code>:<category> - <subcategory>
...
</example>

<example>
N/A
</example>

# Output constraints
- Please reply in Traditional Chinese
- Do not answer with information that is outside the scope of the knowledge base
- Please output only the text inside <example> and </example>, do not include other descriptions
- The output should not include <example> and </example>
- Answer based on the knowledge base; if the intent cannot be determined, please respond using the text inside the <example> below

3. Start using

Usage examples

Single-turn dialogue

Multi-turn dialogue

LLM and RAG technology have solved the bottleneck of intent recognition in multi-turn dialogues that was difficult for BERT.

PreviousVoice customer service NextVoice call summarization

Last updated 4 months ago

Was this helpful?