Multimodal

LLM applications that can handle multimodal input as well as output can be logged with Literal AI. For example:

Images: Vision and Generation
Video: Vision and Generation
Audio: Speech-to-Text and Text-to-Speech
Other file types, like PDF files, are also supported

You can leverage multimodal capabilities on Literal AI in two ways:

Simple logging on API calls to Multimodal LLM APIs, like gpt-vision
Save multimodal files as Attachments. Image, videos, audio and other files are shown as Attachment in the Literal AI platform, which can be accessed and downloaded via a Step.

Simple logging of multimodal LLM APIs

Leverage one of the integrations and multimodal logging will be automatic. You can also use the ChatGeneration API to log the API call.

Example of a logged multimodal LLM call

Attachments

Attachments serves the purpose of saving files that are important to your LLM application, but are not sent as-is to LLM APIs. Attachments are displayed this way:

Example of attachments

Attachments API

Check this guide for an example of multimodal conversation logging in Python with OpenAI and Literal AI.

Get Started

Concepts

Integrations

Self Hosting

More

Simple logging of multimodal LLM APIs

Attachments

Attachments API

Get Started

Concepts

Integrations

Self Hosting

More

​Simple logging of multimodal LLM APIs

​Attachments

​Attachments API

Simple logging of multimodal LLM APIs

Attachments

Attachments API