The source code of this notebook can be found in the Literal AI Github Cookbooks.

This notebook shows you how to validate changes on your RAG application against context relevancy.
We rely on Ragas to evaluate that metric, to then visualize our iterative experiments in Literal AI.

First, we create a dataset from an example RAG application. Second, we evaluate the impact of a retrieval parameter change (# of contexts) on context relevancy:

Run a RAG application

Create a Chroma vector database

Import the Literal AI SDK

Create a prompt

Ask questions to RAG application

Create a Dataset

Experiment A

With # contexts = 2

Evaluate with Ragas

Prepare Ragas data samples

Run the evaluation

We will evaluate context relevancy which checks how relevant the retrieved contexts are to answer the user’s question.

The more unneeded details in the contexts, the less relevant (between 0 and 1, 0 being least relevant).

Persist experiment to Literal AI

Experiment B

With # contexts = 1

Evaluate with Ragas

We evaluate with the first context only to see how context relevancy gets impacted.

Persist experiment to Literal AI

Visualize from Literal AI Experiments !

Comparing both experiments from Literal AI, one can visualize the diff in retrieved contexts, two for experiment A versus one for experiment B.

Context relevancy captures the ratio of question-relevant facts in retrieved contexts.

When we retrieve irrelevant contexts (the two facts about the dog do not help towards answering the question), context relevancy is 1/3.
Once we limit ourselves to a single context, we retrieve exactly the one useful fact, which yields a maximum context relevancy of 1.

Comparison of Experiments

Comparison of Experiments