AI Customer Service Quality Management Guide
Target Audience: Customer Service Managers, Quality Management Personnel, Customer Service Trainers
1. Quick Start: Three Quality Metrics for AI Customer Service
How to View Evaluation Report Scores?
Path: AgentOps (sidebar) → AI Assistant Monitoring
In the table, you can directly view the three major scoring metrics for each conversation. Click "View" to see complete details.
Why Do We Need Evaluation?
Just like reviewing customer service call recordings, we also need to check the quality of AI responses. The system automatically scores each conversation, helping you quickly identify issues.
Three Core Metrics
Faithfulness Score
Whether the information provided by AI is correct, whether it fabricates or makes things up
Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌
Answer Relevancy Score
Whether AI answers the customer's actual question
Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌
Context Precision Score
Whether AI finds the right reference materials, whether it's precise regarding context
Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌
Simple Assessment Method
All three metrics > 80 points → ✅ This response is excellent
Any metric < 60 points → ❌ Requires immediate improvement
Two or more < 70 points → ⚠️ Systemic issue, needs comprehensive review2. How to Understand Evaluation Reports
Report Example
Conversation ID: #20240120-001
Customer Question: "Is the black trench coat in XL size still in stock?"
AI Response: "The black trench coat is currently in stock, XL size can be ordered."
Evaluation Results:
├─ Faithfulness Score: 45 points ❌ (Claims in stock, but actually out of stock)
├─ Answer Relevancy Score: 90 points ✅ (Indeed answered the stock question)
└─ Context Precision Score: 70 points ⚠️ (Found trench coat data, but size information is not precise enough)
Problem Diagnosis: AI provided incorrect inventory informationThree Common Problem Types
Problem A: Low Faithfulness Score (< 60 points)
Symptoms: Information provided by AI is incorrect or fabricated
Common Causes:
Reference materials are outdated (prices, inventory, policies have been updated)
Conflicting data (different documents say different things)
AI "guesses" answers without relying on database content
Impact: Customers may receive incorrect information, leading to complaints
Problem B: Low Answer Relevancy Score (< 60 points)
Symptoms: AI doesn't answer what the customer actually wants to know
Common Causes:
AI provides lengthy responses but misses the point
Answers are irrelevant, discussing unrelated content
Only explains background without providing actual answers
Impact: Customers need to ask again, reducing satisfaction
Problem C: Low Context Precision Score (< 60 points)
Symptoms: AI finds wrong reference materials or is not precise enough
Common Causes:
Inaccurate keyword searches
Confusion between different products/categories of data
Unclear titles or categories in reference materials
Impact: Even if AI wants to answer well, using wrong data leads to wrong answers
3. Real-World Cases: Common Issues in Fashion Retail
Case 1: Incorrect Pricing (Low Faithfulness Score)
📊 Problem Discovery
Customer Question: "How much is this down jacket?"
AI Response: "This down jacket is priced at NT$ 3,990."
Evaluation Results:
Faithfulness Score: 38 points ❌
Answer Relevancy Score: 95 points ✅
Context Precision Score: 85 points ✅
Actual Situation: The product has been repriced to NT$ 2,990 (after discount), but AI still provided the original price.
💡 Improvement Solution
Step 1: Check Data
Open "Product Price List" in knowledge base
Discovered prices were not updated
Step 2: Update Data
❌ Problem Data
Down Jacket Series
- Classic Down Jacket: NT$ 3,990
- Long Down Jacket: NT$ 4,990✅ Corrected Data
Down Jacket Series (Winter 2024 Prices)
Product Name: Classic Down Jacket
- Original Price: NT$ 3,990
- Discounted Price: NT$ 2,990 (Starting 2024/1/1)
- Promotion Period: 2024/1/1 - 2024/2/28
Product Name: Long Down Jacket
- Original Price: NT$ 4,990
- Discounted Price: NT$ 3,990 (Starting 2024/1/1)
- Promotion Period: 2024/1/1 - 2024/2/28
Last Updated: 2024/1/15Step 3: Remind AI
Add to "AI Response Principles":
"When answering about prices, always confirm whether it's the latest price and specify the applicable period. If there's a discount promotion, please inform both original price and discounted price."
Case 2: Irrelevant Answer (Low Answer Relevancy Score)
📊 Problem Discovery
Customer Question: "If it doesn't fit, can I return it?"
AI Response: "We provide multiple size options, including S, M, L, XL. Each garment's size chart is indicated on the product page. We recommend referring to the size chart for measurements before purchasing."
Evaluation Results:
Faithfulness Score: 90 points ✅
Answer Relevancy Score: 35 points ❌
Context Precision Score: 60 points ⚠️
Problem Diagnosis: Customer asked "can I return it," but AI is explaining "how to choose sizes," completely missing the core question.
💡 Improvement Solution
Step 1: Adjust Response Principles
Add to "AI Response Principles":
Core Principle: Answer the customer's core question first
Wrong Example:
Customer asks "Can I return it?"
AI answers "We have comprehensive after-sales service..." ❌ (Doesn't say yes or no)
Correct Example:
Customer asks "Can I return it?"
AI answers "Yes. Within 7 days of receiving the product, if unused, you can apply for return..." ✅ (Direct answer)
Step 2: Reorganize Data Structure
❌ Problem Data (Too scattered)
After-Sales Service Description
Our company values customer rights and provides comprehensive after-sales service.
Please read the size chart carefully when purchasing and choose the appropriate size.
If you have any questions, please contact customer service...✅ Corrected Data
Return and Exchange Policy
Q: Can I return the product?
A: Yes. Within 7 days of receiving the product, if unused with tags intact, you can apply for return.
Return Conditions:
✅ Product unused, not washed, not tried on for more than 10 minutes
✅ Tags and packaging intact
✅ Application submitted within 7 days of receipt
❌ Sale items, underwear cannot be returned
Return Process:
1. Call customer service hotline 0800-XXX-XXX or fill out online form
2. Customer service provides return address and tracking number
3. Refund within 7-14 days to original payment method
---
Q: Can I exchange the product?
A: Yes. Within 7 days of receipt, you can exchange size or color for free (same product only).
Exchange Process:
1. Contact customer service to confirm stock availability
2. Return product (same conditions as returns)
3. New product shipped within 3 days after receiving returned itemImprovement Highlights:
✅ Directly answer "yes" or "no" at the beginning
✅ Use Q&A format for clarity
✅ List clear conditions to avoid disputes
Case 3: Wrong Product Found (Low Context Precision Score)
📊 Problem Discovery
Customer Question: "What is the material of the black knit top?"
AI Response: "This knit top is made of 100% pure cotton, soft and comfortable, suitable for all seasons."
Evaluation Results:
Faithfulness Score: 88 points ✅
Answer Relevancy Score: 90 points ✅
Context Precision Score: 48 points ❌
Actual Situation: The black knit top is 70% wool + 30% polyester, but AI found data for the "white knit top" (100% pure cotton).
💡 Improvement Solution
Step 1: Check Data Labels
Problem data file name:
Knit_Tops.pdfIssue: All knit tops are in one document, making it difficult for AI to distinguish.
Step 2: Improve Data Structure
✅ Solution A: Separate Files
Product_Data/
├─ Knit_Top_Black_Model_A001.pdf
├─ Knit_Top_White_Model_A002.pdf
├─ Knit_Top_Gray_Model_A003.pdf✅ Solution B: Clear Titles
# Knit Top Product Information
## Black Knit Top (Model: A001)
- Color: Black
- Material: 70% wool + 30% polyester
- Suitable Season: Autumn/Winter
- Care Instructions: Hand wash, do not tumble dry
## White Knit Top (Model: A002)
- Color: White
- Material: 100% pure cotton
- Suitable Season: All seasons
- Care Instructions: Machine washable, low temperature dry
## Gray Knit Top (Model: A003)
- Color: Gray
- Material: 50% wool + 50% acrylic
- Suitable Season: Autumn/Winter
- Care Instructions: Dry clean onlyStep 3: Remind AI
Add to "AI Response Principles":
"When customers mention product color or model number, always confirm that the reference material corresponds to that specific color or model. Different colors of the same product may have different materials and specifications."
4. Three-Step Improvement Plan
When problems are identified, follow this process:
Discover low scores
↓
Step 1: Update data content (most important)
↓
Step 2: Adjust AI response principles
↓
Step 3: Report to technical team (if needed)Step 1: Update Data Content
Applicable Situations:
✅ Low faithfulness score (incorrect or outdated data)
✅ Low context precision score (disorganized data, unclear labels)
Checklist:
Data Quality Examples:
❌ Poor Data
Return Policy
Some products can be returned, but certain conditions must be met.
Some special products cannot be returned, please note before purchasing.
If you need to return, please contact customer service.✅ Good Data
Return Policy
Returnable Products:
✅ General clothing (tops, pants, outerwear)
✅ Accessories (bags, hats, scarves)
Non-Returnable Products:
❌ Underwear, swimwear
❌ Sale items (50% off or more)
❌ Customized products
Return Conditions (all must be met):
1. Within 7 days of receipt
2. Product unused (tags intact, no signs of wear)
3. Packaging intact
Return Process:
1. Call customer service hotline 0800-XXX-XXX
2. Provide order number
3. Customer service provides return address
4. Return product (registered mail recommended)
5. Refund within 7-14 days after receiving returned item
Contact Methods:
- Customer Service Hotline: 0800-XXX-XXX (09:00-21:00)
- Online Chat: Chat box at bottom right of website
- Email: [email protected]Step 2: Adjust AI Response Principles
Applicable Situations:
✅ Low answer relevancy score (irrelevant answers)
✅ Low faithfulness score (AI guessing, fabricating)
AI Response Principles Template:
# AI Customer Service Response Principles
## Core Rules
1. **Answer the core question first**
- Customer asks "can I/is it possible" → First answer "yes" or "no"
- Customer asks "how much" → State price first
- Customer asks "how to" → Provide steps first
2. **Only state what you're certain about**
- All information must come from reference materials
- If uncertain, say "This requires human customer service assistance"
- Never guess or assume
3. **Pay attention to details**
- Do not confuse colors, sizes, model numbers
- Confirm prices are current
- Clearly state promotion periods
## Response Format
### Policy Questions (returns/exchanges, membership, promotions)
First paragraph: Directly answer "yes" or "no"
Second paragraph: Explain conditions (in bullet points)
Third paragraph: Tell customer what to do (process or contact method)
### Product Questions (price, material, stock)
First paragraph: Directly answer the question (price/material/stock availability)
Second paragraph: Supplement product information (specifications, sizes, colors)
Third paragraph: Purchase link or next step
### Process Questions (how to buy, return, exchange)
First paragraph: Summarize process (3-5 steps)
Second paragraph: Detail each step
Third paragraph: Important notes or contact information
## Prohibited Actions
❌ Cannot say "usually," "generally," "approximately" (must be specific)
❌ Cannot confuse information from different products
❌ Cannot omit important conditions (price, size, period)
❌ Cannot speculate about information customer didn't mention
## Examples
✅ Good Response:
Customer: "Can I return this jacket?"
AI: "Yes. Within 7 days of receipt, if the following conditions are met, you can apply for return:
- Product unused, tags intact
- Packaging in good condition
- Not a sale item
Return Process:
Please call customer service hotline 0800-XXX-XXX, we will provide return address and instructions.
Refund will be processed within 7-14 business days to original payment method."
❌ Poor Response:
Customer: "Can I return this jacket?"
AI: "Our company values customer rights and provides comprehensive after-sales service.
We recommend reading product descriptions carefully before purchasing and choosing the appropriate size.
If you have any questions, please contact customer service..."
(Does not directly answer whether return is possible)Step 3: Report to Technical Team
Applicable Situations:
Context precision score consistently low
Same problem recurring
No improvement after adjusting data and principles
Report Content:
Problem Type: Low Context Precision Score
Problem Description:
When customers inquire about "black" products, AI frequently finds data for "white" or other colored products.
Impact Scope:
Approximately 15% of product inquiry issues experience this situation
Attempted Improvements:
✅ Separated data files for different colored products
✅ Clearly labeled colors in titles
⚠️ Problem still not completely resolved
Recommended Technical Adjustments:
Hope system can more accurately identify "color" keywords
Attachments:
- test_cases_color_queries.csv (100 test questions)
- current_results.csv (current system retrieval results)
- expected_results.csv (expected correct results)5. Daily Management Checklist
Daily Inspection
When problems are discovered:
If same type of problem occurs ≥ 3 times
→ Handle immediately (update data or adjust principles)
If involves pricing or policy errors
→ Emergency correction, complete same day
If isolated incident
→ Record for observation, add to discussionResponse Quality Tracking
1. Data Review
Weekly Statistics:
- Total conversations: ___
- Average Faithfulness Score: ___ points
- Average Answer Relevancy Score: ___ points
- Average Context Precision Score: ___ points
- Abnormal conversations: ___ (____%)2. Problem Analysis
Top 3 High-Frequency Issues:
1. ________ (__ times) - Which metric is low?
2. ________ (__ times) - Which metric is low?
3. ________ (__ times) - Which metric is low?3. Improvement Actions
This Week's Tasks:
□ Update ___ data files (Responsible person: ___)
□ Adjust ___ response principles (Responsible person: ___)
□ Report ___ technical issues (Responsible person: ___)
Next Week's Goals:
- Reduce abnormal conversation rate to < ____%
- All metrics average > ___ pointsAppendix A: Problem Diagnosis Quick Reference
Low Faithfulness Score
Outdated or incorrect data, AI fabrication
Step 1: Update data content
Low Answer Relevancy Score
AI provides irrelevant answers
Step 2: Adjust response principles
Low Context Precision Score
AI finds wrong or imprecise data
Step 1: Improve data labels
Multiple low metrics
Systemic issue
Step 1+2, Step 3 if necessary
Improvement Priority
First Priority: Faithfulness Score < 60 points
→ May provide customers with incorrect information or fabricated content, causing complaints
Second Priority: Answer Relevancy Score < 60 points
→ Poor customer experience, requires repeated inquiries
Third Priority: Context Precision Score < 60 points
→ Although problem is not obvious, will affect quality long-termAppendix B: System Evaluation Metrics Reference Table
Primary Metrics (No Standard Answer Required)
These three metrics are the core of this guide and can be directly applied to daily customer service conversation evaluation:
Faithfulness Score
Faithfulness
Evaluates whether AI responses align with database content, whether it fabricates or makes up information
Answer Relevancy Score
Answer Relevancy
Evaluates whether AI responses are relevant to customer questions, whether answers are off-topic
Context Precision Score
Context Precision
Evaluates whether AI responses are precise regarding context, whether correct reference materials are found
Advanced Metrics (Standard Answer Required)
The following metrics require prepared "ground truth" standard answers, suitable for test case evaluation:
Answer Correctness
Answer Correctness
Compares AI response with standard answer, evaluates correctness
Answer Similarity
Answer Similarity
Evaluates semantic similarity between AI response and standard answer
Context Recall
Context Recall
Evaluates whether system retrieves all necessary reference materials
Other Available Metrics (DeepEval)
The system also supports the following additional evaluation metrics for more comprehensive quality inspection:
Bias Detection
Bias
Detects whether responses contain biased or discriminatory content
Toxicity Detection
Toxicity
Detects whether responses contain inappropriate or offensive content
Hallucination Detection
Hallucination
Detects whether AI generates content inconsistent with facts
Contextual Relevancy
Contextual Relevancy
Evaluates whether retrieved reference materials are relevant to the question
Usage Recommendations
Daily Monitoring: Use three primary metrics (Faithfulness Score, Answer Relevancy Score, Context Precision Score)
Test Evaluation: Combine with advanced metrics, prepare standard answers for systematic evaluation
Quality Control: Enable bias and toxicity detection to ensure responses comply with corporate standards
FAQ
Q1: I'm not technical, can I manage AI customer service? A: Yes! Just like managing customer service staff, you only need to:
Review evaluation reports daily, identify problem conversations
Check that data is correct and complete
Adjust AI "response principles" (like training service scripts)
Q2: How are scores generated? Does AI evaluate itself? A: No. Scoring is automatically performed by a specialized "evaluation system," like having another AI acting as "quality control" to check the first AI's responses.
Q3: Are all three metrics important? Can I just look at one? A: We recommend reviewing all three because they reflect different issues:
Faithfulness Score: Whether AI aligns with database content, whether it fabricates
Answer Relevancy Score: Whether AI understands the question, whether response is relevant
Context Precision Score: Whether AI is precise regarding context, finds correct reference materials
If you only look at one, you may miss important issues.
Q4: How soon will I see results after improvements? A:
Data updates: Immediate effect (improvements visible same day)
Response principle adjustments: Immediate effect
Technical adjustments: Requires 2-4 weeks (depending on problem complexity)
Conclusion
Managing AI customer service is like managing a human customer service team:
✅ Regular quality checks (review evaluation reports) ✅ Continuous knowledge updates (update data content) ✅ Optimize service scripts (adjust response principles) ✅ Track improvement results (monitor score changes)
By following this guide, even without technical knowledge, you can make AI customer service better and better!
Last updated
Was this helpful?
