Let’s say we have a simple LLM application that takes a user input, performs a retrieval step and generates the final response with an LLM.

The code for this application would look like this:

import asyncio

async def semantic_search(question: str):
  # Fake semantic search
  await asyncio.sleep(1)

  # Return fake results
  return ["chunk 1", "chunk 2", "chunk 3"]

async def generate_response(question: str, search_results: list):
  # Fake response generation
  await asyncio.sleep(2)

  return "Fake answer"

async def main():
  question = input("What is your question?")
  search_results = await semantic_search(question)
  answer = await generate_response(question, search_results)
  print(answer)

if __name__ == "__main__":
  asyncio.run(main())

Logging the conversation with Literal AI

First, we initialize the Literal AI client.

import os
from literalai import LiteralClient

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

Logging the steps

In this example we have 2 steps: semantic_search and generate_response. We can use the step decorator to log these steps.

@literal_client.step(type="retrieval")
async def semantic_search(question: str):
  ...

@literal_client.step(type="llm")
async def generate_response(question: str, search_results: list):
  ...

Logging the run

@literal_client.step(type="run")
async def run_rag(question: str):
  results = await semantic_search(question)
  answer = await generate_response(question, results)
  return answer  

Logging the thread

A thread is a sequence of steps that are related to each other. In our example, we have a single thread. To create a thread, we use the thread decorator.

@literal_client.thread
async def main():
  ...

Logging the user question and final answer

Finally, we can log the user question and the final answer using client.message.

@literal_client.thread
async def main():
  question = input("What is your question?")
  literal_client.message(content=question, type="user_message", name="User")
  answer = await run_rag(question)
  literal_client.message(content=answer, type="assistant_message", name="Assistant")
  print(answer)

Full code

import asyncio
import os
from literalai import LiteralClient

literal_client = LiteralClient(api_key=os.getenv("LITERAL_API_KEY"))

@literal_client.step(type="retrieval")
async def semantic_search(question: str):
  await asyncio.sleep(1)
  return ["chunk 1", "chunk 2", "chunk 3"]

@literal_client.step(type="llm")
async def generate_response(question: str, search_results: list):
  await asyncio.sleep(2)
  return "Fake answer"

@literal_client.step(type="run")
async def run_rag(question: str):
  results = await semantic_search(question)
  answer = await generate_response(question, results)
  return answer

@literal_client.thread
async def main():
  question = input("What is your question?")
  literal_client.message(content=question, type="user_message", name="User")
  answer = await run_rag(question)
  literal_client.message(content=answer, type="assistant_message", name="Assistant")
  print(answer)

if __name__ == "__main__":
  asyncio.run(main())
# Network requests by the SDK are performed asynchronously.
# Invoke flush_and_stop() to guarantee the completion of all requests prior to the process termination.
# WARNING: If you run a continuous server, you should not use this method.
client.flush_and_stop()

Running the example in Python

To run the example, you need to install the Literal AI client:

pip install literalai

Then, you can run the example:

python example.py

On the Literal AI platform, you will see the following thread being logged:

Rendering of the Thread