The Llama Index integration enables to monitor your RAG pipelines with a single line of code.
You should create a new instance of the callback handler for each invocation.
import os
from literalai import LiteralClient
from llama_index.core.callbacks import CallbackManager
literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))
Settings.callback_manager = CallbackManager([literal_client.llama_index_callback()])# Assuming you have an index object
query_engine = index.as_query_engine(similarity_top_k=2)# Optional: Reset the context for each request to avoid mixing up the context
literal_client.reset_context()
You can combine the Llama Index callback handler with the concept of Thread to monitor multiple calls in a single thread.
import os
from literalai import LiteralClient
from llama_index.core.callbacks import CallbackManager
literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))# Optional: Reset the context for each request to avoid mixing up the context
literal_client.reset_context()with literal_client.thread(name="Llama Index example")as thread:
Settings.callback_manager = CallbackManager([literal_client.llama_index_callback()])# Call here