IVR customer intent recognition
IVR (Interactive Voice Response) customer intent recognitionRefers to the application of AI voice processing technology that allows users to interact with customer service systems via voice commands and automatically recognizes customer intent to provide corresponding services. In Taiwan, such systems have been widely used in banking, telecommunications, healthcare, and other fields to improve service efficiency and customer experience.
Core functions and process
Receive voice input After the customer calls, the system plays an automated voice menu (for example: "Please briefly state your request, and we will assist you."). The customer does not need to press keys; they express their request directly by voice.
Speech recognition (ASR) The system uses Automatic Speech Recognition technology to convert speech into text. For example, if the customer says "I want to check my bill," the system converts that content into text input.
Semantic understanding (NLU) The system uses Natural Language Understanding technology to analyze semantics and determine the user's actual need. For example:
Lexical analysis: keywords like "check" and "bill" indicate the request is related to billing.
Intent identification: determine the customer's purpose is to "check the bill."
Response and routing Based on the semantic analysis results, the system provides corresponding services, which may include the following options:
Direct response: If the request can be handled automatically, for example, "Your bill amount is 1200 NTD, and the payment deadline is December 15."
Transfer to agent: If the request is more complex, the system automatically transfers to the appropriate department's customer service representative and simultaneously provides a semantic summary to reduce repetitive communication.
Technical challenges and limitations
Difficulty understanding diverse expressions
Users' language expressions may be nonstandard, for example: "What’s going on with my bill?" or "What should I do about payment issues?" These meanings can be vague, and models like BERT sometimescannot accurately determine the specific need.
In Taiwan, there is also the challenge of mixed use of multiple languages and dialects (Mandarin, Taiwanese, Hakka), and language models still lack sufficient support for these language data.
Limitations in intent granularity
Although modern NLP models can handle large amounts of text data, they cannot fully grasp certain industry-specific knowledge or special intents. For example: "I want to know the exact date of my last payment" may require connecting to different systems to answer correctly.
Even though BERT performs well on shorter dialogue segments, long sentences or complex semantic expressions can cause the model to become confused.
Data bias and incomplete corpora
Training language modelsrequires large amounts of localized corpora, if data is insufficient or biased toward a single form of expression, the model's ability to adapt to special contexts will be inadequate. For example, Taiwan-specific language habits such as "top-up" or "skip number" may lack enough contextual corpus for the model.
Context and memory limitations
Customer conversations usually have contextual relevance. For example, multi-turn dialogues like "About the payment you mentioned just now, I have other questions" require the system to remember previous intents. Current NLP models have limited performance in this area.
If intent determination is wrong, users may have to restate their request, causing frustration.
Low error tolerance
Customers have limited patience for customer service systems; if the voice system determines incorrectly, customers may feel frustrated and ultimately request to speak with a human.
Industry status
Currently, many companies still use traditional key-press menu processes for voice customer service systems. For example, Sinopac Bank's voice customer service system as an example, its design fully considers business diversity and provides multi-level menu options to guide users. However, there is still much room for optimization and improvement in the user interaction experience.
The emergence of LLM (Large Language Models) and RAG (Retrieval-Augmented Generation) has brought revolutionary changes to semantic recognition and overall IVR systems, making voice customer service systems smarter, more accurate, and more adaptable, overcoming many limitations of traditional NLP technologies.

Solutions
Below, using MaiAgent's powerful and precise LLM and RAG capabilities, we will guide everyone step by step to create an extremely capable semantic recognition assistant.
Operation steps
1. Data preparation
Introducing LLM and RAG greatly simplifies the labeled data preparation process. Now, you only need to organize the data into an Excel sheet, simply list intent categories, and upload this bank customer service list to MaiAgent AI assistant's knowledge base to support the operation of an intelligent semantic recognition system.

2. Define role instructions
# Role
You are MaiAgent Bank's semantic understanding robot
# Output format
Please, based on the user's dialogue, understand the service intent the customer wants, determine the user's desired service intent from the knowledge base and output
When the intent is clear, please output only one intent; if multiple similar intents cannot be distinguished, please list up to 3 closest intents
<example>
-<code>:<category> - <subcategory>
</example>
<example>
-<code>:<category> - <subcategory>
-<code>:<category> - <subcategory>
...
</example>
<example>
N/A
</example>
# Output constraints
- Please reply in Traditional Chinese
- Do not answer with information that is outside the scope of the knowledge base
- Please output only the text inside <example> and </example>, do not include other descriptions
- The output should not include <example> and </example>
- Answer based on the knowledge base; if the intent cannot be determined, please respond using the text inside the <example> below
3. Start using
Usage examples
Single-turn dialogue

Multi-turn dialogue
LLM and RAG technology have solved the bottleneck of intent recognition in multi-turn dialogues that was difficult for BERT.

Last updated
Was this helpful?