A dataset is an essential component in refining your LLM application’s performance. It comprises a collection of input/output samples for conducting tests and validations.

A dataset consists of Dataset Items. A Dataset Item has an input, expected output and can contain metadata. The input, expected output and metadata of the items in a dataset should follow the same schema.

Dataset Types

There are two types of datasets in Literal AI: Key Value and Generation.

Key Value datasets can have any key-value pairs for input and expected output. This type of dataset can be used to for example store Runs of agents, with an input (user query) and an expected output (LLM answer). An example of a Key Value dataset:


input = {
  "content": "Can you name a movie about space travel?"
}

expected_output = { 
  "content": "A movie about space travel is \"Interstellar\"." 
}

Generation datasets are a type of dataset that can store Generation items. Generation items have a chat history with messages from different roles and content. The expected output consists of a role and content. An example of a Generation Dataset Item:


input = {
  "messages": [{
    "role": "system"
    "content": "You are a helpful assistant." 
  }, {
    "role": "user"
    "content": "Can you name a movie about space travel?" 
  }] 
}

expected_output = { 
  "role": "assistant",
  "content": "A movie about space travel is \"Interstellar\"." 
}

Creating Datasets and Dataset Items

Via Literal AI UI

There are two ways you can create a new Dataset in the UI.

  1. Go to the Dataset tab, and create a new Dataset from there. You need to provide the Dataset Type, and give it a name.
  2. Go to an individual Run, Step or Generation that you want to add to a new dataset. You can create a new Dataset from here.

To add items to your dataset, go to indivual Runs, Steps or Generations and click on "Add to Dataset", and choose your preferred Dataset. You can edit the expected output of the chatbot, so that the ground truth is saved.

Adding a Step to a Dataset

Adding a Step to a Dataset

Screen of adding a Step to a Dataset

Screen of adding a Step to a Dataset

To view your Datasets and its items, go to the Datasets tab.

Via the SDKs

Our SDK facilitates dataset management, enabling both manual handling and the transformation of existing steps into actionable samples. This feature is designed for iterative development and fine-tuning of your application. For the complete API reference, check the Python and TypeScript SDK pages. Note: Dataset functions of the Python client are only possible synchronously.

Create a dataset

from literalai import LiteralClient
import os

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

# create a new dataset
dataset = literal_client.api.create_dataset(
  name="Foo", 
  description="A dataset to store samples.", 
  metadata={"isDemo": True},
  type="key_value" # Default type is "key_value", other is "generation"
)

Create a dataset item

Now that we have a dataset, we can create dataset items. There are two ways to add items via the SDK. One is manually adding items in the code, the other option is to add existing Steps, Runs or Generations to the dataset.

from literalai import LiteralClient
import os

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

# option 1: add local items
dataset_item = literal_client.api.create_dataset_item(
  dataset_id = "<DATASET_ID>", # dataset.id
  input = { "content": "What is Literal AI ?" },
  expected_output = { "content": "Literal AI is an observability solution." }
)

# option 2: add existing steps, runs, generations
dataset_item = literal_client.api.add_step_to_dataset(
  dataset_id = "<DATASET_ID>", # dataset.id
  step_id = "step_id" # reference to a step, run or generation
)
dataset_item = literal_client.api.add_generation_to_dataset(
  dataset_id = "<DATASET_ID>", # dataset.id
  generation_id = "generation_id" # reference to a step, run or generation
)

Get a dataset

from literalai import LiteralClient
import os

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

dataset = literal_client.api.get_dataset(id="<DATASET_ID>")

for item in dataset.items:
  pass

Run experiments

Once you have created a populated Dataset, you can start running evaluations.