Online Evaluation
Evaluate LLM generations on the fly!
Introduction
Online evaluation lets you create and delete Rules which automatically evaluate your LLM application.
The current scope for Online Evaluation is focused on LLM Generations and allows you to classify LLM generations into categories of a Score Template.
To get started, browse to the Online Evaluation menu in the navigation side bar to open the Online Evaluation page where you can manage Rules and Tasks.
![Online Evaluation page](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/online-evaluation-page.png)
Online Evaluation
Rules
To create a Rule, fill in the form template shown below. The specified provider will be used as an LLM judge to score a percentage of your ingested LLM generations by classifying them according to the specified score template.
The evaluations take place upon ingestion of Generation and are run against the LLM provider of your choice. Make sure to create and link to Shared Credentials and to carefully set the sample rate to a reasonable value to avoid incurring high costs from your LLM provider.
![Create Rule Dialog](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/create-rule-dialog.png)
Create Rule Dialog
Here are two examples of correct vs. incorrect Score Template:
![Correct Score Template](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/correct-rule-score-template.png)
Correct Score Template
![Incorrect Score Template](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/incorrect-rule-score-template.png)
Incorrect Score Template
Tasks
When your LLM-based application triggers a call to an LLM, a Generation object is ingested and a Rule may trigger to evaluate that Generation, based on its sample rate.
Each time a Rule is triggered to evaluate a Generation, the LLM judge is triggered in an asynchronous Task.
You can review the triggered Rules and the Generations they ran against via Tasks and can directly jump to the evaluated Generation to view its score!
![Evaluation Tasks](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/evaluation-tasks.png)
Evaluation Tasks
The LLM-generated AI score will show the category value along with a reason behind this evaluation:
![AI score generated via Rules](https://mintlify.s3-us-west-1.amazonaws.com/chainlit-5/images/ai-score-via-rule.png)
AI score generated via Rules
With Rules properly set up, you can easily perform semantic sentiment analysis on your LLM generations, with categories such as Positive/Neutral/Negative. From the Generations table, filters let you quickly have access to the negative generations for review and action.
Was this page helpful?