Everything tagged llm (1 post)

Demystifying OpenAI Assistants - Runs, Threads, Messages, Files and Tools

As I mentioned in the previous post, OpenAI dropped a ton of functionality recently, with the shiny new Assistants API taking center stage. In this release, OpenAI introduced the concepts of Threads, Messages, Runs, Files and Tools - all higher-level concepts that make it a little easier to reason about long-running discussions involving multiple human and AI users.

Prior to this, most of what we did with OpenAI's API was call the chat completions API (setting all the non-text modalities aside for now), but to do so we had to keep passing all of the context of the conversation to OpenAI on each API call. This means persisting conversation state on our end, which is fine, but the Assistants API and related functionality makes it easier for developers to get started without reinventing the wheel.

OpenAI Assistants

An OpenAI Assistant is defined as an entity with a name, description, instructions, default model, default tools and default files. It looks like this:

Let's break this down a little. The name and description are self-explanatory - you can change them later via the modify Assistant API, but they're otherwise static from Run to Run. The model and instructions fields should also be familiar to you, but in this case they act as defaults and can be easily overridden for a given Run, as we'll see in a moment.

Tools needs a little more explanation. Tools refers to the set of optional capabilities that can be enabled for the Assistant, but they can also be overridden for a particular run. There are 2 broad types of Tool - OpenAI-hosted and self-hosted. At the moment there are 2 OpenAI-hosted tools - Code Interpreter and Retrieval. To allow your Assistant to write and run code to solve problems, you must enable Code Interpreter; to allow it to look at files you give it, you must enable Retrieval. I suspect this category of tools will just be switched on by default in the future, but for now you have to do it yourself.

The second set of tools are your Custom Functions. I discussed these a little in the last post - basically it's just a way to tell the Assistant about functions you have in your codebase that you would like it to be able to invoke (albeit not directly - read the previous post for more). These are just JSON definitions of the names and shapes of your functions - there's no actual code being sent or run there.

Tools, therefore, means zero or more of your own Custom Functions, plus Retrieval and/or Code Interpreter, if you want to enable them. Tools can be defined at Assistant creation-time, but can be overridden at Run creation-time.

Finally, let's examine Files. Files are actually their own top-level concept; once you upload a File you can then link it to Assistants or Messages - under the covers there are AssistantFile and MessageFile objects that allow there to be a many-to-many relationship between Assistants and Files. Again, Files you make available to your Assistant at creation-time can be overridden at Run-time.

Threads and Messages

A Thread is just an ordered array of Messages. A Message has a role (either "user" or "assistant" - human or machine), some content (what the user said) and an optional set of Files. As before, the Files are linked to the message via an underlying MessageFile, so Files can be reused between Assistants and Messages.

In this example we have a Thread with 4 Messages. The first two are from human participants in the Thread, perhaps Bob is asking for some calendar and product data, so Fred (another human) sends it, along with whatever message content he wrote. But there are also 2 Assistants in the Thread - imaginatively named Assistant 1 and Assistant 2, who wrote Message 3 and Message 4 respectively. In order for these two Messages to be created and added to the Thread, the Assistants will need to be invoked via a Run.

So What's a Run?

A Run is an entity that represents the process of invoking an Assistant on a Thread. Only one Run can be executing at a time for a given Thread. The Run configuration declares which Assistant should be invoked, what Thread ID to use, and then a bunch of familiar-looking optional parameters. For example, you can define the instructions for the Assistant when you create the Assistant itself, but you can also override them for the specific Run:

You probably noticed that the Run diagram looks a lot like the Assistant diagram. Most of the stuff you can define on the Assistant can be overridden at Run creation-time. You can even change which model the Assistant uses during the Run, which feels a little odd and probably isn't something you'd do too often, but at the end of the day it's just swapping one text-in-text-out function call for another so why not - see the final paragraph of this post for why this might be.

Although you can set your Assistant up with Tools and Files, you can also override those at Run creation-time. It's nice to have that flexibility, though I think it's easier to reason about Assistant capabilities than Run-specific Assistant capabilities, so I suspect most use cases will not involve overriding Tools and Files at Run-creation time. You are currently limited to 20 Files per Assistant, with some size limits too, so the Run-specific overriding of Files would be a way to have your Assistants operate on more than 20 files during the Thread lifetime. That's a slightly hacky way around what is probably a short-term limitation though.

Tracing Runs across a Thread

Returning to our Thread 123 example a couple of pictures up, let's take a look at the Runs that were invoked against our Thread. In the image below we have 3 runs - the last one is a bonus Run against a hypothetical Message 5 in our Thread, showing that you can override basically everything an Assistant is on the Run itself.

Run 1 was created against our Thread 123 at some point after Bob and Fred had sent their Messages (Message 1 and Message 2). Run 1 is super basic - it just defines the Assistant to use (Assistant 1) and the Thread to execute on (Thread 123 - the same for all of these Runs). Its execution yields Message 3, which is added to the Thread.

We then triggered Run 2, this time asking Assistant 2 to provide its input, as well as overriding both the model and instructions for Assistant 2, and providing a custom set of Tools. This yields Message 4, which completes the Thread example above.

Run 3 is just to show what a next Run invocation might look like, customizing Files, Tools, model and instructions. At this point, you're arguably not using the Assistants API at all as everything in your Assistant has been overridden.

Bear in mind that each Run has to be triggered by something - it won't happen automatically by Messages being appended to a Thread, so you need something that actually kicks this off. One challenge in Threads that involve multiple human and Assistant users is figuring out when to invoke which Assistant - I'll have some more thoughts on that in an upcoming post.

A Simplified Conceptual Model

Let's close out with a simplified diagram of the relationships between the actors in this play. On the right we find the Assistant, configured with its default Files and Tools. It is also tied to a set of Runs, as each Run is executed against a single Assistant. There's a one-to-many relationship between the Assistant and its Runs, though these Runs could be against more than one Thread.

On the other side of the diagram, we see that a Thread is composed of multiple Messages, which can be added to later, and that a Thread also has multiple Runs associated with it. Messages can have message-specific Files attached in addition to their content.

Finally, the glue holding it all together in the center is the Run, which executes on specific Thread using a specific Assistant, but can also provide Run-specific Files and Tools to make available to the Assistant during the invocation. Usually a new Message will be appended to the Thread as a result of the invocation, but the Run lifecycle is a little deeper than that and worthy of further examination in another post.

Although there are implied one-to-many relationships between Assistant and Run, and between Thread and Run, there is currently no way to get all of the Runs for a given Thread [UPDATE: listRuns API now does this] or for a given Assistant, so if you want to track the state of a Run currently executing on a Thread, you need to keep track of both the Thread ID and the Run ID to be able to use the getRun API to get the Run status. I imagine this will change in the near future.

This is definitely progress in terms of making it easier for developers to build persistent generative AI applications with a chat component, though it looks like this is all just an abstraction placed over the same old underlying LLM text-in-text-out function. That's not to say that abstractions like this are not a very welcome thing, just bear in mind what's really happening under the covers.

Looking at the picture above it's fairly easy to see how the set of Messages, Files and Tools (the Custom Function definitions at least) in a Thread could be smushed together into a big ole blob of text and fed to the LLM, probably stitched inside some other prompt text. This is why it's reasonable (though probably not all that useful) to swap out the model between Runs - at the end of the day we're just passing a bunch of text into a function called an LLM and getting some text out of it.

Continue reading