The Llama Index integration enables to monitor your RAG pipelines with a single line of code.

You should create a new instance of the callback handler for each invocation.
import os
from literalai import LiteralClient
from llama_index.core.callbacks import CallbackManager

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))
Settings.callback_manager = CallbackManager([literal_client.llama_index_callback()])

# Assuming you have an index object
query_engine = index.as_query_engine(similarity_top_k=2)

# Optional: Reset the context for each request to avoid mixing up the context
literal_client.reset_context()

Multiple calls in a single thread

You can combine the Llama Index callback handler with the concept of Thread to monitor multiple calls in a single thread.

import os
from literalai import LiteralClient
from llama_index.core.callbacks import CallbackManager

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

# Optional: Reset the context for each request to avoid mixing up the context
literal_client.reset_context()

with literal_client.thread(name="Llama Index example") as thread:
    Settings.callback_manager = CallbackManager([literal_client.llama_index_callback()])
    # Call here