We are continuously developing to Literal AI. Our focus is on rolling out new features as well as improving the Developer Experience.

We release a new guide, tutorial or cookbook for every new feature in Literal AI.

Upcoming new features:

A/B testing

With A/B testing, you can compare the performance of two different prompts or LLM settings. You can already do an A/B test in your code (docs), but with this development you will be able to:

  • Create A/B tests with different prompt, model or application versions. You can define populations with either Tags or Metadata.
  • Monitor A/B tests charts in a dashboard, grouped by population in a time range.

Roles and Orgs

Currently in Literal AI, you can define Roles and Projects for collaboration. The project management will be improved with more ganular control over permission:

  • Customization: You will be able to define custom roles with specific permissions. Diverse users and organizational structures will be able to manage their projects in Literal AI.
  • Security: More granular permission settings reduces the rist ok unauthorized access or actions.


The User Experience of the Dashboard in Literal AI will be improved. You will be able to visualize and analyze your data more efficiently, using new chart features. Main focus points in this improvement are:

  • Enhance Data Interaction: visually interact with your data to intuitively extract insights.
  • Scalability: manage different scales of data and user count without performance degradation.
  • Smooth transition: as little change in code base and database as possible.

Online Evaluation on Literal AI

Right now, you can run evaluations from your code, and view and compare the results in Literal AI. With this new feature you will be able to do online evaluation on the Literal AI platform (server-side evaluation instead of client-side). Server-side online evaluation is: score Generation objects as Literal AI ingests them. As a new Generation in a Thread comes in, an automatic evaluation of the content of this LLM Generation is triggered. You can view the performance of your LLM Generations directly in the platform. You enable this by setting an evaluation rule by some metric that you define.


This feature is about supporting continuous data streams in your LLM applications. For example, live voice conversation and video streams.

Contact us if you are interested in the functionality.