AI Customer Service Quality Management Guide

Target Audience: Customer Service Managers, Quality Management Personnel, Customer Service Trainers

1. Quick Start: Three Quality Metrics for AI Customer Service

How to View Evaluation Report Scores?

Path: AgentOps (sidebar) → AI Assistant Monitoring

In the table, you can directly view the three major scoring metrics for each conversation. Click "View" to see complete details.

Why Do We Need Evaluation?

Just like reviewing customer service call recordings, we also need to check the quality of AI responses. The system automatically scores each conversation, helping you quickly identify issues.


Three Core Metrics

Metric Name
Plain Language Explanation
Scoring Standards

Faithfulness Score

Whether the information provided by AI is correct, whether it fabricates or makes things up

Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌

Answer Relevancy Score

Whether AI answers the customer's actual question

Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌

Context Precision Score

Whether AI finds the right reference materials, whether it's precise regarding context

Above 85 points ✅ 60-84 points ⚠️ Below 60 points ❌


Simple Assessment Method

All three metrics > 80 points  → ✅ This response is excellent
Any metric < 60 points  → ❌ Requires immediate improvement
Two or more < 70 points → ⚠️ Systemic issue, needs comprehensive review

2. How to Understand Evaluation Reports

Report Example

Conversation ID: #20240120-001
Customer Question: "Is the black trench coat in XL size still in stock?"
AI Response: "The black trench coat is currently in stock, XL size can be ordered."

Evaluation Results:
├─ Faithfulness Score: 45 points ❌ (Claims in stock, but actually out of stock)
├─ Answer Relevancy Score: 90 points ✅ (Indeed answered the stock question)
└─ Context Precision Score: 70 points ⚠️ (Found trench coat data, but size information is not precise enough)

Problem Diagnosis: AI provided incorrect inventory information

Three Common Problem Types

Problem A: Low Faithfulness Score (< 60 points)

Symptoms: Information provided by AI is incorrect or fabricated

Common Causes:

  • Reference materials are outdated (prices, inventory, policies have been updated)

  • Conflicting data (different documents say different things)

  • AI "guesses" answers without relying on database content

Impact: Customers may receive incorrect information, leading to complaints


Problem B: Low Answer Relevancy Score (< 60 points)

Symptoms: AI doesn't answer what the customer actually wants to know

Common Causes:

  • AI provides lengthy responses but misses the point

  • Answers are irrelevant, discussing unrelated content

  • Only explains background without providing actual answers

Impact: Customers need to ask again, reducing satisfaction


Problem C: Low Context Precision Score (< 60 points)

Symptoms: AI finds wrong reference materials or is not precise enough

Common Causes:

  • Inaccurate keyword searches

  • Confusion between different products/categories of data

  • Unclear titles or categories in reference materials

Impact: Even if AI wants to answer well, using wrong data leads to wrong answers


3. Real-World Cases: Common Issues in Fashion Retail

Case 1: Incorrect Pricing (Low Faithfulness Score)

📊 Problem Discovery

Customer Question: "How much is this down jacket?"

AI Response: "This down jacket is priced at NT$ 3,990."

Evaluation Results:

  • Faithfulness Score: 38 points

  • Answer Relevancy Score: 95 points ✅

  • Context Precision Score: 85 points ✅

Actual Situation: The product has been repriced to NT$ 2,990 (after discount), but AI still provided the original price.


💡 Improvement Solution

Step 1: Check Data

  • Open "Product Price List" in knowledge base

  • Discovered prices were not updated

Step 2: Update Data

Problem Data

Down Jacket Series
- Classic Down Jacket: NT$ 3,990
- Long Down Jacket: NT$ 4,990

Corrected Data

Down Jacket Series (Winter 2024 Prices)

Product Name: Classic Down Jacket
- Original Price: NT$ 3,990
- Discounted Price: NT$ 2,990 (Starting 2024/1/1)
- Promotion Period: 2024/1/1 - 2024/2/28

Product Name: Long Down Jacket
- Original Price: NT$ 4,990
- Discounted Price: NT$ 3,990 (Starting 2024/1/1)
- Promotion Period: 2024/1/1 - 2024/2/28

Last Updated: 2024/1/15

Step 3: Remind AI

Add to "AI Response Principles":

"When answering about prices, always confirm whether it's the latest price and specify the applicable period. If there's a discount promotion, please inform both original price and discounted price."


Case 2: Irrelevant Answer (Low Answer Relevancy Score)

📊 Problem Discovery

Customer Question: "If it doesn't fit, can I return it?"

AI Response: "We provide multiple size options, including S, M, L, XL. Each garment's size chart is indicated on the product page. We recommend referring to the size chart for measurements before purchasing."

Evaluation Results:

  • Faithfulness Score: 90 points ✅

  • Answer Relevancy Score: 35 points

  • Context Precision Score: 60 points ⚠️

Problem Diagnosis: Customer asked "can I return it," but AI is explaining "how to choose sizes," completely missing the core question.


💡 Improvement Solution

Step 1: Adjust Response Principles

Add to "AI Response Principles":

Core Principle: Answer the customer's core question first

Wrong Example:

  • Customer asks "Can I return it?"

  • AI answers "We have comprehensive after-sales service..." ❌ (Doesn't say yes or no)

Correct Example:

  • Customer asks "Can I return it?"

  • AI answers "Yes. Within 7 days of receiving the product, if unused, you can apply for return..." ✅ (Direct answer)

Step 2: Reorganize Data Structure

Problem Data (Too scattered)

After-Sales Service Description

Our company values customer rights and provides comprehensive after-sales service.
Please read the size chart carefully when purchasing and choose the appropriate size.
If you have any questions, please contact customer service...

Corrected Data

Return and Exchange Policy

Q: Can I return the product?
A: Yes. Within 7 days of receiving the product, if unused with tags intact, you can apply for return.

Return Conditions:
✅ Product unused, not washed, not tried on for more than 10 minutes
✅ Tags and packaging intact
✅ Application submitted within 7 days of receipt
❌ Sale items, underwear cannot be returned

Return Process:
1. Call customer service hotline 0800-XXX-XXX or fill out online form
2. Customer service provides return address and tracking number
3. Refund within 7-14 days to original payment method

---

Q: Can I exchange the product?
A: Yes. Within 7 days of receipt, you can exchange size or color for free (same product only).

Exchange Process:
1. Contact customer service to confirm stock availability
2. Return product (same conditions as returns)
3. New product shipped within 3 days after receiving returned item

Improvement Highlights:

  • ✅ Directly answer "yes" or "no" at the beginning

  • ✅ Use Q&A format for clarity

  • ✅ List clear conditions to avoid disputes


Case 3: Wrong Product Found (Low Context Precision Score)

📊 Problem Discovery

Customer Question: "What is the material of the black knit top?"

AI Response: "This knit top is made of 100% pure cotton, soft and comfortable, suitable for all seasons."

Evaluation Results:

  • Faithfulness Score: 88 points ✅

  • Answer Relevancy Score: 90 points ✅

  • Context Precision Score: 48 points

Actual Situation: The black knit top is 70% wool + 30% polyester, but AI found data for the "white knit top" (100% pure cotton).


💡 Improvement Solution

Step 1: Check Data Labels

Problem data file name:

Knit_Tops.pdf

Issue: All knit tops are in one document, making it difficult for AI to distinguish.

Step 2: Improve Data Structure

Solution A: Separate Files

Product_Data/
├─ Knit_Top_Black_Model_A001.pdf
├─ Knit_Top_White_Model_A002.pdf
├─ Knit_Top_Gray_Model_A003.pdf

Solution B: Clear Titles

# Knit Top Product Information

## Black Knit Top (Model: A001)
- Color: Black
- Material: 70% wool + 30% polyester
- Suitable Season: Autumn/Winter
- Care Instructions: Hand wash, do not tumble dry

## White Knit Top (Model: A002)
- Color: White
- Material: 100% pure cotton
- Suitable Season: All seasons
- Care Instructions: Machine washable, low temperature dry

## Gray Knit Top (Model: A003)
- Color: Gray
- Material: 50% wool + 50% acrylic
- Suitable Season: Autumn/Winter
- Care Instructions: Dry clean only

Step 3: Remind AI

Add to "AI Response Principles":

"When customers mention product color or model number, always confirm that the reference material corresponds to that specific color or model. Different colors of the same product may have different materials and specifications."


4. Three-Step Improvement Plan

When problems are identified, follow this process:

Discover low scores

Step 1: Update data content (most important)

Step 2: Adjust AI response principles

Step 3: Report to technical team (if needed)

Step 1: Update Data Content

Applicable Situations:

  • ✅ Low faithfulness score (incorrect or outdated data)

  • ✅ Low context precision score (disorganized data, unclear labels)

Checklist:

Data Quality Examples:

Poor Data

Return Policy

Some products can be returned, but certain conditions must be met.
Some special products cannot be returned, please note before purchasing.
If you need to return, please contact customer service.

Good Data

Return Policy

Returnable Products:
✅ General clothing (tops, pants, outerwear)
✅ Accessories (bags, hats, scarves)

Non-Returnable Products:
❌ Underwear, swimwear
❌ Sale items (50% off or more)
❌ Customized products

Return Conditions (all must be met):
1. Within 7 days of receipt
2. Product unused (tags intact, no signs of wear)
3. Packaging intact

Return Process:
1. Call customer service hotline 0800-XXX-XXX
2. Provide order number
3. Customer service provides return address
4. Return product (registered mail recommended)
5. Refund within 7-14 days after receiving returned item

Contact Methods:
- Customer Service Hotline: 0800-XXX-XXX (09:00-21:00)
- Online Chat: Chat box at bottom right of website
- Email: [email protected]

Step 2: Adjust AI Response Principles

Applicable Situations:

  • ✅ Low answer relevancy score (irrelevant answers)

  • ✅ Low faithfulness score (AI guessing, fabricating)

AI Response Principles Template:

# AI Customer Service Response Principles

## Core Rules

1. **Answer the core question first**
   - Customer asks "can I/is it possible" → First answer "yes" or "no"
   - Customer asks "how much" → State price first
   - Customer asks "how to" → Provide steps first

2. **Only state what you're certain about**
   - All information must come from reference materials
   - If uncertain, say "This requires human customer service assistance"
   - Never guess or assume

3. **Pay attention to details**
   - Do not confuse colors, sizes, model numbers
   - Confirm prices are current
   - Clearly state promotion periods

## Response Format

### Policy Questions (returns/exchanges, membership, promotions)
First paragraph: Directly answer "yes" or "no"
Second paragraph: Explain conditions (in bullet points)
Third paragraph: Tell customer what to do (process or contact method)

### Product Questions (price, material, stock)
First paragraph: Directly answer the question (price/material/stock availability)
Second paragraph: Supplement product information (specifications, sizes, colors)
Third paragraph: Purchase link or next step

### Process Questions (how to buy, return, exchange)
First paragraph: Summarize process (3-5 steps)
Second paragraph: Detail each step
Third paragraph: Important notes or contact information

## Prohibited Actions

❌ Cannot say "usually," "generally," "approximately" (must be specific)
❌ Cannot confuse information from different products
❌ Cannot omit important conditions (price, size, period)
❌ Cannot speculate about information customer didn't mention

## Examples

✅ Good Response:
Customer: "Can I return this jacket?"
AI: "Yes. Within 7 days of receipt, if the following conditions are met, you can apply for return:
- Product unused, tags intact
- Packaging in good condition
- Not a sale item

Return Process:
Please call customer service hotline 0800-XXX-XXX, we will provide return address and instructions.
Refund will be processed within 7-14 business days to original payment method."

❌ Poor Response:
Customer: "Can I return this jacket?"
AI: "Our company values customer rights and provides comprehensive after-sales service.
We recommend reading product descriptions carefully before purchasing and choosing the appropriate size.
If you have any questions, please contact customer service..."
(Does not directly answer whether return is possible)

Step 3: Report to Technical Team

Applicable Situations:

  • Context precision score consistently low

  • Same problem recurring

  • No improvement after adjusting data and principles

Report Content:

Problem Type: Low Context Precision Score

Problem Description:
When customers inquire about "black" products, AI frequently finds data for "white" or other colored products.

Impact Scope:
Approximately 15% of product inquiry issues experience this situation

Attempted Improvements:
✅ Separated data files for different colored products
✅ Clearly labeled colors in titles
⚠️ Problem still not completely resolved

Recommended Technical Adjustments:
Hope system can more accurately identify "color" keywords

Attachments:
- test_cases_color_queries.csv (100 test questions)
- current_results.csv (current system retrieval results)
- expected_results.csv (expected correct results)

5. Daily Management Checklist

Daily Inspection

When problems are discovered:

If same type of problem occurs ≥ 3 times
→ Handle immediately (update data or adjust principles)

If involves pricing or policy errors
→ Emergency correction, complete same day

If isolated incident
→ Record for observation, add to discussion

Response Quality Tracking

1. Data Review

Weekly Statistics:
- Total conversations: ___ 
- Average Faithfulness Score: ___ points
- Average Answer Relevancy Score: ___ points
- Average Context Precision Score: ___ points
- Abnormal conversations: ___ (____%)

2. Problem Analysis

Top 3 High-Frequency Issues:
1. ________ (__ times) - Which metric is low?
2. ________ (__ times) - Which metric is low?
3. ________ (__ times) - Which metric is low?

3. Improvement Actions

This Week's Tasks:
□ Update ___ data files (Responsible person: ___)
□ Adjust ___ response principles (Responsible person: ___)
□ Report ___ technical issues (Responsible person: ___)

Next Week's Goals:
- Reduce abnormal conversation rate to < ____%
- All metrics average > ___ points

Appendix A: Problem Diagnosis Quick Reference

Score Situation
Possible Cause
Improvement Method

Low Faithfulness Score

Outdated or incorrect data, AI fabrication

Step 1: Update data content

Low Answer Relevancy Score

AI provides irrelevant answers

Step 2: Adjust response principles

Low Context Precision Score

AI finds wrong or imprecise data

Step 1: Improve data labels

Multiple low metrics

Systemic issue

Step 1+2, Step 3 if necessary


Improvement Priority

First Priority: Faithfulness Score < 60 points
→ May provide customers with incorrect information or fabricated content, causing complaints

Second Priority: Answer Relevancy Score < 60 points
→ Poor customer experience, requires repeated inquiries

Third Priority: Context Precision Score < 60 points
→ Although problem is not obvious, will affect quality long-term

Appendix B: System Evaluation Metrics Reference Table

Primary Metrics (No Standard Answer Required)

These three metrics are the core of this guide and can be directly applied to daily customer service conversation evaluation:

Chinese Name
English Full Name
Description

Faithfulness Score

Faithfulness

Evaluates whether AI responses align with database content, whether it fabricates or makes up information

Answer Relevancy Score

Answer Relevancy

Evaluates whether AI responses are relevant to customer questions, whether answers are off-topic

Context Precision Score

Context Precision

Evaluates whether AI responses are precise regarding context, whether correct reference materials are found

Advanced Metrics (Standard Answer Required)

The following metrics require prepared "ground truth" standard answers, suitable for test case evaluation:

Chinese Name
English Full Name
Description

Answer Correctness

Answer Correctness

Compares AI response with standard answer, evaluates correctness

Answer Similarity

Answer Similarity

Evaluates semantic similarity between AI response and standard answer

Context Recall

Context Recall

Evaluates whether system retrieves all necessary reference materials

Other Available Metrics (DeepEval)

The system also supports the following additional evaluation metrics for more comprehensive quality inspection:

Chinese Name
English Name
Description

Bias Detection

Bias

Detects whether responses contain biased or discriminatory content

Toxicity Detection

Toxicity

Detects whether responses contain inappropriate or offensive content

Hallucination Detection

Hallucination

Detects whether AI generates content inconsistent with facts

Contextual Relevancy

Contextual Relevancy

Evaluates whether retrieved reference materials are relevant to the question

Usage Recommendations

  1. Daily Monitoring: Use three primary metrics (Faithfulness Score, Answer Relevancy Score, Context Precision Score)

  2. Test Evaluation: Combine with advanced metrics, prepare standard answers for systematic evaluation

  3. Quality Control: Enable bias and toxicity detection to ensure responses comply with corporate standards


FAQ

Q1: I'm not technical, can I manage AI customer service? A: Yes! Just like managing customer service staff, you only need to:

  • Review evaluation reports daily, identify problem conversations

  • Check that data is correct and complete

  • Adjust AI "response principles" (like training service scripts)


Q2: How are scores generated? Does AI evaluate itself? A: No. Scoring is automatically performed by a specialized "evaluation system," like having another AI acting as "quality control" to check the first AI's responses.


Q3: Are all three metrics important? Can I just look at one? A: We recommend reviewing all three because they reflect different issues:

  • Faithfulness Score: Whether AI aligns with database content, whether it fabricates

  • Answer Relevancy Score: Whether AI understands the question, whether response is relevant

  • Context Precision Score: Whether AI is precise regarding context, finds correct reference materials

If you only look at one, you may miss important issues.


Q4: How soon will I see results after improvements? A:

  • Data updates: Immediate effect (improvements visible same day)

  • Response principle adjustments: Immediate effect

  • Technical adjustments: Requires 2-4 weeks (depending on problem complexity)


Conclusion

Managing AI customer service is like managing a human customer service team:

Regular quality checks (review evaluation reports) ✅ Continuous knowledge updates (update data content) ✅ Optimize service scripts (adjust response principles) ✅ Track improvement results (monitor score changes)

By following this guide, even without technical knowledge, you can make AI customer service better and better!

Last updated

Was this helpful?