Evaluations
Define what "success" looks like for your AI application. Create specific, measurable criteria that guide optimization and measure performance objectively.
What Are Evaluations?
Evaluations are specific, measurable criteria that define good performance for your AI application. Instead of subjective judgment, evaluations provide objective benchmarks for optimization.
How Evaluations Work
When your AI application processes an input:
Output Generated
Your AI creates a response
Evaluations Applied
Each active evaluation scores the output
Individual Scores Calculated
Each evaluation gets a 0-10 score
Overall Score Computed
Average of all evaluation scores
Results Logged
Scores and reasoning saved to Event Log
Creating Evaluations
You have two options for creating evaluations: let the system generate them automatically or create them manually for specific requirements.
⚡Automatic Generation
Best for: Getting started quickly with proven evaluation criteria
How it works:
Benefits:
- • Proven criteria based on similar use cases
- • Good starting point for further customization
- • Saves time on initial setup
✍️Manual Creation
Best for: Specific requirements or fine-tuned control over success criteria
How it works:
Benefits:
- • Complete control over criteria
- • Task-specific requirements
- • Custom business logic
Writing Effective Evaluation Criteria
Be Specific and Measurable
Focus on Observable Outcomes
Use Clear, Objective Language
Common Evaluation Categories
📊Accuracy-Focused
Ensure factual correctness and completeness
📝Format-Focused
Ensure consistent structure and presentation
⭐Quality-Focused
Measure overall usefulness and appropriateness
Use Case Examples
📄Data Extraction Applications
🎧Customer Support Applications
✍️Content Generation Applications
Managing Your Evaluations
Active vs Inactive Evaluations
🟢 Active Evaluations
- • Used in optimization scoring
- • Contribute to overall accuracy metrics
- • Guide automatic optimization decisions
⚫ Inactive Evaluations
- • Don't affect current scoring
- • Can be reactivated when needed
- • Useful for testing different criteria
Evaluation Actions
For each evaluation, you can:
How Evaluations Impact Optimization
Automatic Optimization
Evaluations guide both automatic and manual optimization:
- • Focuses on improving the lowest-scoring evaluations
- • Creates Prompt Family variations to handle different criteria
- • Prioritizes changes that improve overall evaluation performance
Manual Optimization
Evaluations provide clear direction for manual improvements:
- • Shows which specific evaluations need attention
- • Helps you target optimization efforts effectively
- • Provides clear metrics for measuring improvement
Individual Evaluation Scores
Each evaluation follows the same 0-10 scale:
Best Practices
Getting Started
- • Begin with 3-5 core evaluations
- • Test with sample inputs
- • Run initial optimization
- • Add specific criteria gradually
Balance & Focus
- • Cover accuracy, format, quality
- • Avoid redundant evaluations
- • Focus on user impact
- • Keep manageable scope
Ongoing Management
- • Monitor performance in Event Log
- • Revise low-scoring criteria
- • Add evaluations for edge cases
- • Remove non-valuable ones
Testing & Validation
- • Use diverse test inputs
- • Check edge case reliability
- • Verify scoring expectations
- • Get team feedback