LLM applications that can handle multimodal input as well as output can be logged with Literal AI. For example:

  • Images: Vision and Generation
  • Video: Vision and Generation
  • Audio: Speech-to-Text and Text-to-Speech
  • Other file types, like PDF files, are also supported

You can leverage multimodal capabilities on Literal AI in two ways:

  • Simple logging on API calls to Multimodal LLM APIs, like gpt-vision
  • Save multimodal files as Attachments. Image, videos, audio and other files are shown as Attachment in the Literal AI platform, which can be accessed and downloaded via a Step.

Simple logging of multimodal LLM APIs

Leverage one of the integrations and multimodal logging will be automatic. You can also use the ChatGeneration API to log the API call.

A logged multimodal LLM call

Example of a logged multimodal LLM call

Attachments

Attachments serves the purpose of saving files that are important to your LLM application, but are not sent as-is to LLM APIs. Attachments are displayed this way:

Attachments on a step

Example of attachments

Attachments API

Check this guide for an example of multimodal conversation logging in Python with OpenAI and Literal AI.