Introduction

Online evaluation lets you create and delete Rules which automatically evaluate your LLM application.

The current scope for Online Evaluation is focused on LLM Generations and allows you to classify LLM generations into categories of a Score Template.

To get started, browse to the Online Evaluation menu in the navigation side bar to open the Online Evaluation page where you can manage Rules and Tasks.

Online Evaluation page

Online Evaluation

Rules

To create a Rule, fill in the form template shown below. The specified provider will be used as an LLM judge to score a percentage of your ingested LLM generations by classifying them according to the specified score template.

The evaluations take place upon ingestion of Generation and are run against the LLM provider of your choice. Make sure to create and link to Shared Credentials and to carefully set the sample rate to a reasonable value to avoid incurring high costs from your LLM provider.

Create Rule Dialog

Create Rule Dialog

The Score Template categories must be explicit as the prompt used to evaluate generations relies solely on the categories names, not on the Score Template name.

Here are two examples of correct vs. incorrect Score Template:

Correct Score Template

Correct Score Template

Incorrect Score Template

Incorrect Score Template

Tasks

When your LLM-based application triggers a call to an LLM, a Generation object is ingested and a Rule may trigger to evaluate that Generation, based on its sample rate.

Each time a Rule is triggered to evaluate a Generation, the LLM judge is triggered in an asynchronous Task.

You can review the triggered Rules and the Generations they ran against via Tasks and can directly jump to the evaluated Generation to view its score!

Evaluation Tasks

Evaluation Tasks

The LLM-generated AI score will show the category value along with a reason behind this evaluation:

AI score generated via Rules

AI score generated via Rules

With Rules properly set up, you can easily perform semantic sentiment analysis on your LLM generations, with categories such as Positive/Neutral/Negative. From the Generations table, filters let you quickly have access to the negative generations for review and action.