In Literal AI, you can add Tags, Scores and Metadata to units like Threads, Steps, Runs and Generations. What do they mean and when do you use which type?

Tags

Tags can be assigned to Threads, Steps and Generations. You can label these units to filter data. Tags are shared across units, application-wide. For example, Rags could be related to data management such as “to review” or “reviewed”. Tags could also be related to the nature of the question. For a customer support bot, you could have “billing”, “return policy”, etc.

Scores

Scores allow you to evaluate the LLM system performance at three levels: LLM generations, Agent Runs and Conversation Threads. This can be used to track the performance and accuracy of your system. Scores can be human generated (human feedback, like a thump up or down), or AI generated (hallucination evaluation for instance). A score or feedback can be assigned to a Generation, Step or Thread. Scores can be visualized on the dashboard in charts, and data can be filtered by scores.

Metadata

Metadata enriches objects with additional context, details, or configuration that can be instrumental in customizing and enhancing the functionality of each object. Metadata can be added to Thread, Step, User, and Generation. Metadata adds parameters that are unique to that Thread, Step, User or Generation.

Example

Let’s review this in an example.

1. Create a Thread, with Metadata to a Step

2. View Step Metadata in UI

If you run the code above, you’ve created a thread with two steps. One step, the user query, has metadata attached. Let’s view this in the Literal AI UI.

Metadata view in Literal AI

Metadata view in Literal AI

3. Add Tags to Thread

In the UI and in code, we can add Tags to the Thread and individual Steps.

Add a Tag to a Thread

Add a Tag to a Thread

4. Add Score

Finally, we can add a Score to a Step. Let’s evaluate how good the response of the assistant was. We can do this in the UI or by code.

Add Score to Step

Add Score to Step

5. View Scores and Filter by Tags

You can review Human or AI evaluations in the Scores tab in the UI. You can filter scores by Step Tags.

View Scores

View Scores

You can Filter Threads by Tags assigned to Threads, like we did in this example.

Filter Threads by Tags

Filter Threads by Tags