Everything tagged ai (2 posts)

Demystifying OpenAI Assistants - Runs, Threads, Messages, Files and Tools

As I mentioned in the previous post, OpenAI dropped a ton of functionality recently, with the shiny new Assistants API taking center stage. In this release, OpenAI introduced the concepts of Threads, Messages, Runs, Files and Tools - all higher-level concepts that make it a little easier to reason about long-running discussions involving multiple human and AI users.

Prior to this, most of what we did with OpenAI's API was call the chat completions API (setting all the non-text modalities aside for now), but to do so we had to keep passing all of the context of the conversation to OpenAI on each API call. This means persisting conversation state on our end, which is fine, but the Assistants API and related functionality makes it easier for developers to get started without reinventing the wheel.

OpenAI Assistants

An OpenAI Assistant is defined as an entity with a name, description, instructions, default model, default tools and default files. It looks like this:

Let's break this down a little. The name and description are self-explanatory - you can change them later via the modify Assistant API, but they're otherwise static from Run to Run. The model and instructions fields should also be familiar to you, but in this case they act as defaults and can be easily overridden for a given Run, as we'll see in a moment.

Tools needs a little more explanation. Tools refers to the set of optional capabilities that can be enabled for the Assistant, but they can also be overridden for a particular run. There are 2 broad types of Tool - OpenAI-hosted and self-hosted. At the moment there are 2 OpenAI-hosted tools - Code Interpreter and Retrieval. To allow your Assistant to write and run code to solve problems, you must enable Code Interpreter; to allow it to look at files you give it, you must enable Retrieval. I suspect this category of tools will just be switched on by default in the future, but for now you have to do it yourself.

The second set of tools are your Custom Functions. I discussed these a little in the last post - basically it's just a way to tell the Assistant about functions you have in your codebase that you would like it to be able to invoke (albeit not directly - read the previous post for more). These are just JSON definitions of the names and shapes of your functions - there's no actual code being sent or run there.

Tools, therefore, means zero or more of your own Custom Functions, plus Retrieval and/or Code Interpreter, if you want to enable them. Tools can be defined at Assistant creation-time, but can be overridden at Run creation-time.

Finally, let's examine Files. Files are actually their own top-level concept; once you upload a File you can then link it to Assistants or Messages - under the covers there are AssistantFile and MessageFile objects that allow there to be a many-to-many relationship between Assistants and Files. Again, Files you make available to your Assistant at creation-time can be overridden at Run-time.

Threads and Messages

A Thread is just an ordered array of Messages. A Message has a role (either "user" or "assistant" - human or machine), some content (what the user said) and an optional set of Files. As before, the Files are linked to the message via an underlying MessageFile, so Files can be reused between Assistants and Messages.

In this example we have a Thread with 4 Messages. The first two are from human participants in the Thread, perhaps Bob is asking for some calendar and product data, so Fred (another human) sends it, along with whatever message content he wrote. But there are also 2 Assistants in the Thread - imaginatively named Assistant 1 and Assistant 2, who wrote Message 3 and Message 4 respectively. In order for these two Messages to be created and added to the Thread, the Assistants will need to be invoked via a Run.

So What's a Run?

A Run is an entity that represents the process of invoking an Assistant on a Thread. Only one Run can be executing at a time for a given Thread. The Run configuration declares which Assistant should be invoked, what Thread ID to use, and then a bunch of familiar-looking optional parameters. For example, you can define the instructions for the Assistant when you create the Assistant itself, but you can also override them for the specific Run:

You probably noticed that the Run diagram looks a lot like the Assistant diagram. Most of the stuff you can define on the Assistant can be overridden at Run creation-time. You can even change which model the Assistant uses during the Run, which feels a little odd and probably isn't something you'd do too often, but at the end of the day it's just swapping one text-in-text-out function call for another so why not - see the final paragraph of this post for why this might be.

Although you can set your Assistant up with Tools and Files, you can also override those at Run creation-time. It's nice to have that flexibility, though I think it's easier to reason about Assistant capabilities than Run-specific Assistant capabilities, so I suspect most use cases will not involve overriding Tools and Files at Run-creation time. You are currently limited to 20 Files per Assistant, with some size limits too, so the Run-specific overriding of Files would be a way to have your Assistants operate on more than 20 files during the Thread lifetime. That's a slightly hacky way around what is probably a short-term limitation though.

Tracing Runs across a Thread

Returning to our Thread 123 example a couple of pictures up, let's take a look at the Runs that were invoked against our Thread. In the image below we have 3 runs - the last one is a bonus Run against a hypothetical Message 5 in our Thread, showing that you can override basically everything an Assistant is on the Run itself.

Run 1 was created against our Thread 123 at some point after Bob and Fred had sent their Messages (Message 1 and Message 2). Run 1 is super basic - it just defines the Assistant to use (Assistant 1) and the Thread to execute on (Thread 123 - the same for all of these Runs). Its execution yields Message 3, which is added to the Thread.

We then triggered Run 2, this time asking Assistant 2 to provide its input, as well as overriding both the model and instructions for Assistant 2, and providing a custom set of Tools. This yields Message 4, which completes the Thread example above.

Run 3 is just to show what a next Run invocation might look like, customizing Files, Tools, model and instructions. At this point, you're arguably not using the Assistants API at all as everything in your Assistant has been overridden.

Bear in mind that each Run has to be triggered by something - it won't happen automatically by Messages being appended to a Thread, so you need something that actually kicks this off. One challenge in Threads that involve multiple human and Assistant users is figuring out when to invoke which Assistant - I'll have some more thoughts on that in an upcoming post.

A Simplified Conceptual Model

Let's close out with a simplified diagram of the relationships between the actors in this play. On the right we find the Assistant, configured with its default Files and Tools. It is also tied to a set of Runs, as each Run is executed against a single Assistant. There's a one-to-many relationship between the Assistant and its Runs, though these Runs could be against more than one Thread.

On the other side of the diagram, we see that a Thread is composed of multiple Messages, which can be added to later, and that a Thread also has multiple Runs associated with it. Messages can have message-specific Files attached in addition to their content.

Finally, the glue holding it all together in the center is the Run, which executes on specific Thread using a specific Assistant, but can also provide Run-specific Files and Tools to make available to the Assistant during the invocation. Usually a new Message will be appended to the Thread as a result of the invocation, but the Run lifecycle is a little deeper than that and worthy of further examination in another post.

Although there are implied one-to-many relationships between Assistant and Run, and between Thread and Run, there is currently no way to get all of the Runs for a given Thread [UPDATE: listRuns API now does this] or for a given Assistant, so if you want to track the state of a Run currently executing on a Thread, you need to keep track of both the Thread ID and the Run ID to be able to use the getRun API to get the Run status. I imagine this will change in the near future.

This is definitely progress in terms of making it easier for developers to build persistent generative AI applications with a chat component, though it looks like this is all just an abstraction placed over the same old underlying LLM text-in-text-out function. That's not to say that abstractions like this are not a very welcome thing, just bear in mind what's really happening under the covers.

Looking at the picture above it's fairly easy to see how the set of Messages, Files and Tools (the Custom Function definitions at least) in a Thread could be smushed together into a big ole blob of text and fed to the LLM, probably stitched inside some other prompt text. This is why it's reasonable (though probably not all that useful) to swap out the model between Runs - at the end of the day we're just passing a bunch of text into a function called an LLM and getting some text out of it.

Continue reading

Using ChatGPT to generate ChatGPT Assistants

OpenAI dropped a ton of cool stuff in their Dev Day presentations, including some updates to function calling. There are a few function-call-like things that currently exist within the Open AI ecosystem, so let's take a moment to disambiguate:

  • Plugins: introduced in March 2023, allowed GPT to understand and call your HTTP APIs
  • Actions: an evolution of Plugins, makes it easier but still calls your HTTP APIs
  • Function Calling: Chat GPT understands your functions, tells you how to call them, but does not actually call them

It seems like Plugins are likely to be superseded by Actions, so we end up with 2 ways to have GPT call your functions - Actions for automatically calling HTTP APIs, Function Calling for indirectly calling anything else. We could call this Guided Invocation - despite the name it doesn't actually call the function, it just tells you how to.

That second category of calls is going to include anything that isn't an HTTP endpoint, so gives you a lot of flexibility to call internal APIs that never learned how to speak HTTP. Think legacy systems, private APIs that you don't want to expose to the internet, and other places where this can act as a highly adaptable glue.

I've put all the source code for this article up at https://github.com/edspencer/gpt-functions-example, so check that out if you want to follow along. It should just be a matter of following the steps in the README, but YMMV. We are, of course, going to use a task management app as a playground.

Creating Function definitions

In order for OpenAI Assistants to be able to call your code, you need to provide them with signatures for all of your functions, in the format that it wants, which look like this:

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
}
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
}

That's pretty self-explanatory. It's also a pain in the ass to keep tweaking and updating as you evolve your app, so let's use the OpenAI Chat Completions API with the json_object setting enabled and see if we can have this done for us.

Our Internal API

Let's build a basic Task management app. We'll just use a super-naive implementation of Todos written in TypeScript. My little API.ts has functions like addTask, updateTask, removeTask, getTasks, etc. All the stuff you'd expect. Some of them take a bunch of different inputs.

Here's a snippet of our API.ts file. It's very basic but functional, using a sqlite database driven by Prisma:

interface TaskInput {
name: string;
priority?: number;
completed?: boolean;
deleted?: boolean;
}

/**
* Adds a new task to the database.
* @param taskInput - An object containing the details of the task to be added.
* @param taskInput.name - The name of the task.
* @param taskInput.priority - The priority of the task.
* @returns A Promise that resolves when the task has been added to the database.
*/
async function addTask(taskInput: Task): Promise<Task | void> {
try {
const task = await prisma.task.create({
data: taskInput
})
console.log(`Task ${task.id} created with name ${task.name} and priority ${task.priority}.`)

return task;
} catch (e) {
console.error(e)
}
}

/**
* Updates a task in the database.
* @param id - The ID of the task to update.
* @param updates - An object containing the updates to apply to the task.
* @param updates.name - The updated name of the task.
* @param updates.priority - The updated priority of the task.
* @param updates.completed - The updated completed status of the task.
* @returns A Promise that resolves when the task has been updated in the database.
*/
async function updateTask(id: string, updates: Partial<TaskInput>): Promise<void> {
try {
const task = await prisma.task.update({
where: { id },
data: updates,
})
console.log(`Task ${task.id} updated with name ${task.name} and priority ${task.priority}.`)
} catch (e) {
console.error(e)
}
}
interface TaskInput {
name: string;
priority?: number;
completed?: boolean;
deleted?: boolean;
}

/**
* Adds a new task to the database.
* @param taskInput - An object containing the details of the task to be added.
* @param taskInput.name - The name of the task.
* @param taskInput.priority - The priority of the task.
* @returns A Promise that resolves when the task has been added to the database.
*/
async function addTask(taskInput: Task): Promise<Task | void> {
try {
const task = await prisma.task.create({
data: taskInput
})
console.log(`Task ${task.id} created with name ${task.name} and priority ${task.priority}.`)

return task;
} catch (e) {
console.error(e)
}
}

/**
* Updates a task in the database.
* @param id - The ID of the task to update.
* @param updates - An object containing the updates to apply to the task.
* @param updates.name - The updated name of the task.
* @param updates.priority - The updated priority of the task.
* @param updates.completed - The updated completed status of the task.
* @returns A Promise that resolves when the task has been updated in the database.
*/
async function updateTask(id: string, updates: Partial<TaskInput>): Promise<void> {
try {
const task = await prisma.task.update({
where: { id },
data: updates,
})
console.log(`Task ${task.id} updated with name ${task.name} and priority ${task.priority}.`)
} catch (e) {
console.error(e)
}
}

It goes on from there. You get the picture. No it's not production-grade code - don't use this as a launchpad for your Todo list manager app. GitHub Copilot actually wrote most of that code (and most of the documentation) for me.

Side note on documentation: it took me more years than I care to admit to figure out that the primary consumer of source code is humans, not machines. The machine doesn't care about your language, formatting, awfulness of your algorithms, weird variable names, etc; algorithmic complexity aside it'll do exactly the same thing regardless of how you craft your code. Humans are a different matter though, and benefit enormously from a little context written in a human language.

Ironically, that same documentation that benefitted human code consumers all this time is now what enables these new machine consumers to grok and invoke your code, saving you the work of coming up with a translation layer to integrate with AI agents. So writing documentation really does help you after all. Also, write tests and eat your vegetables.

Generating the OpenAI translation layer

The code to translate our internal API into something OpenAI can use is fairly simple and reusable. All we do is read in a file as text, stuff the contents of that file into a GPT prompt, send that off to OpenAI, stream the results back to the terminal and save it to a file when done:

/**
* This file uses the OpenAI Chat Completions API to automatically generate OpenAI Function Call
* JSON objects for an arbitrary code file. It takes a source file, reads it and passes it into
* OpenAI with a simple prompt, then writes the output to another file. Extend as needed.
*/

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

import { OptionValues, program } from 'commander';

//takes an input file, and generates a new tools.json file based on the input file
program.option('sourceFile', 'The source file to use for the prompt', './API.ts');
program.option('outputFile', 'The output file to write the tools.json to (defaults to your input + .tools.json');

const openai = new OpenAI();

/**
* Takes an input file, and generates a new tools.json file based on the input file.
* @param sourceFile - The source file to use for the prompt.
* @param outputFile - The output file to write the tools.json to. Defaults to
* @returns Promise<void>
*/
async function build({ sourceFile, outputFile = `${sourceFile}.tools.json` }: OptionValues) {
console.log(`Reading ${sourceFile}...`);
const sourceFileText = fs.readFileSync(path.join(__dirname, sourceFile), 'utf-8');

const prompt = `
This is the implementation of my ${sourceFile} file:

${sourceFileText}

Please give me a JSON object that contains a single key called "tools", which is an array of the functions in this file.
This is an example of what I expect (one element of the array):

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, with lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
},

`
//Call the OpenAI API to generate the function definition, and stream the results back
const stream = await openai.chat.completions.create({
model: 'gpt-4-1106-preview',
response_format: { type: 'json_object' },
messages: [{ role: 'user', content: prompt }],
stream: true,
});

//Keep the new tools.json in memory until we have it all
let newToolsJson = "";

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
process.stdout.write(content);
newToolsJson += content;
}

console.log(`Updating ${outputFile}...}`);

// Write the tools JSON to ../tools.json
fs.writeFileSync(path.join(__dirname, outputFile), newToolsJson);
}

build(program.parse(process.argv).opts());
/**
* This file uses the OpenAI Chat Completions API to automatically generate OpenAI Function Call
* JSON objects for an arbitrary code file. It takes a source file, reads it and passes it into
* OpenAI with a simple prompt, then writes the output to another file. Extend as needed.
*/

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

import { OptionValues, program } from 'commander';

//takes an input file, and generates a new tools.json file based on the input file
program.option('sourceFile', 'The source file to use for the prompt', './API.ts');
program.option('outputFile', 'The output file to write the tools.json to (defaults to your input + .tools.json');

const openai = new OpenAI();

/**
* Takes an input file, and generates a new tools.json file based on the input file.
* @param sourceFile - The source file to use for the prompt.
* @param outputFile - The output file to write the tools.json to. Defaults to
* @returns Promise<void>
*/
async function build({ sourceFile, outputFile = `${sourceFile}.tools.json` }: OptionValues) {
console.log(`Reading ${sourceFile}...`);
const sourceFileText = fs.readFileSync(path.join(__dirname, sourceFile), 'utf-8');

const prompt = `
This is the implementation of my ${sourceFile} file:

${sourceFileText}

Please give me a JSON object that contains a single key called "tools", which is an array of the functions in this file.
This is an example of what I expect (one element of the array):

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, with lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
},

`
//Call the OpenAI API to generate the function definition, and stream the results back
const stream = await openai.chat.completions.create({
model: 'gpt-4-1106-preview',
response_format: { type: 'json_object' },
messages: [{ role: 'user', content: prompt }],
stream: true,
});

//Keep the new tools.json in memory until we have it all
let newToolsJson = "";

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
process.stdout.write(content);
newToolsJson += content;
}

console.log(`Updating ${outputFile}...}`);

// Write the tools JSON to ../tools.json
fs.writeFileSync(path.join(__dirname, outputFile), newToolsJson);
}

build(program.parse(process.argv).opts());

I've made a simple little repo with this file, the API.ts file, and a little demo that shows it all integrated. Run it like this:

ts-node rebuildTools.ts -s API.ts
ts-node rebuildTools.ts -s API.ts

Which will give you some output like this, and then update your API.ts.tools.json file:

ts-node rebuildTools.ts -s API.ts
Reading API.ts...
{
"tools": [
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {

..........truncated...
full output at https://github.com/edspencer/gpt-functions-example/blob/main/API.ts.tools.json
.............................

"returns": {
"type": "Promise<void>",
"description": "A Promise that resolves when all tasks have been deleted from the database."
}
}
}
]
}
Updating ./API.ts.tools.json...
Done
ts-node rebuildTools.ts -s API.ts
Reading API.ts...
{
"tools": [
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {

..........truncated...
full output at https://github.com/edspencer/gpt-functions-example/blob/main/API.ts.tools.json
.............................

"returns": {
"type": "Promise<void>",
"description": "A Promise that resolves when all tasks have been deleted from the database."
}
}
}
]
}
Updating ./API.ts.tools.json...
Done

Creating an OpenAI Assistant and talking to it

We've had Open AI generate our Tools JSON file, now let's see if it can use it with a simple demo.ts, which:

The code is all up on GitHub, and I won't do a blow-by-blow here but let's have a look at the output when we run it:

ts-node ./demo.ts -m "I need to go buy bread from the store, then go to \
the gym. I also need to do my taxes, which is a P1."
ts-node ./demo.ts -m "I need to go buy bread from the store, then go to \
the gym. I also need to do my taxes, which is a P1."

And the output:

Creating assistant...
Created assistant asst_hkT3BFQsNf3HSmJpE8KytiX9 with name Task Planner.
Created thread thread_AigYi0oFrytu3aO5k0mRacIV
Retrieved 0 tasks from the database.
Created message
msg_uLpR3UpQB3pX62wVIA7TcqIl
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_8JX5ffKFpxIhYmJeZYYilpv3',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Buy bread from the store", "priority": 2}'
}
},
{
id: 'call_GC4axxSB6Oso0tiolDLr900X',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Go to the gym", "priority": 2}'
}
},
{
id: 'call_7c5mWt1I5Ff3h5Lvb0Hfw2L7',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Do taxes", "priority": 1}'
}
}
]
Adding task
Task cloyl2gxs0000c3a7hxe6hupc created with name Buy bread from the store and priority 2.
Adding task
Task cloyl2gxv0001c3a7zi4hqt8z created with name Go to the gym and priority 2.
Adding task
Task cloyl2gxx0002c3a7l0gv7f07 created with name Do taxes and priority 1.
Creating assistant...
Created assistant asst_hkT3BFQsNf3HSmJpE8KytiX9 with name Task Planner.
Created thread thread_AigYi0oFrytu3aO5k0mRacIV
Retrieved 0 tasks from the database.
Created message
msg_uLpR3UpQB3pX62wVIA7TcqIl
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_8JX5ffKFpxIhYmJeZYYilpv3',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Buy bread from the store", "priority": 2}'
}
},
{
id: 'call_GC4axxSB6Oso0tiolDLr900X',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Go to the gym", "priority": 2}'
}
},
{
id: 'call_7c5mWt1I5Ff3h5Lvb0Hfw2L7',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Do taxes", "priority": 1}'
}
}
]
Adding task
Task cloyl2gxs0000c3a7hxe6hupc created with name Buy bread from the store and priority 2.
Adding task
Task cloyl2gxv0001c3a7zi4hqt8z created with name Go to the gym and priority 2.
Adding task
Task cloyl2gxx0002c3a7l0gv7f07 created with name Do taxes and priority 1.

You can see all of the steps it takes in the console output. We had the creation of the Assistant, the Thread, then we looked to see if our sqlite database has any existing Tasks, in which case we're going to send those along as input too, then we pass those along with the user's message and get back OpenAI's function invocations (3 in this case). Finally, we iterate over them all and call our internal addTask function, and at the bottom of the output we see that our tasks were created successfully.

Let's go call it again, updating the tasks that we just made:

ts-node demo.ts -m "I finished the laundry, please mark it complete. Also the gym is a P1"
ts-node demo.ts -m "I finished the laundry, please mark it complete. Also the gym is a P1"

Output:

Creating assistant...
Created assistant asst_WbTXKoXWL1yTWs4zvcVkDIDT with name Task Planner.
Created thread thread_mLvr7acahXbnmoe217f0gMRF
Retrieved 3 tasks from the database.
Created message
msg_iYYkAeuxRPNmJZ5vAKwiI8S7
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_W4UKGadROhaJJFZym7vQocP7',
type: 'function',
function: {
name: 'completeTask',
arguments: '{"id": "cloyl2gxs0000c3a7hxe6hupc"}'
}
},
{
id: 'call_KzaYk1x4sIRFWeKlvgOk37qf',
type: 'function',
function: {
name: 'updateTask',
arguments: '{"id": "cloyl2gxv0001c3a7zi4hqt8z", "updates": {"priority": 1}}'
}
}
]
Completing task
Task cloyl2gxs0000c3a7hxe6hupc marked as completed.
Updating task
Task cloyl2gxv0001c3a7zi4hqt8z updated with name Go to the gym and priority 1.
Creating assistant...
Created assistant asst_WbTXKoXWL1yTWs4zvcVkDIDT with name Task Planner.
Created thread thread_mLvr7acahXbnmoe217f0gMRF
Retrieved 3 tasks from the database.
Created message
msg_iYYkAeuxRPNmJZ5vAKwiI8S7
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_W4UKGadROhaJJFZym7vQocP7',
type: 'function',
function: {
name: 'completeTask',
arguments: '{"id": "cloyl2gxs0000c3a7hxe6hupc"}'
}
},
{
id: 'call_KzaYk1x4sIRFWeKlvgOk37qf',
type: 'function',
function: {
name: 'updateTask',
arguments: '{"id": "cloyl2gxv0001c3a7zi4hqt8z", "updates": {"priority": 1}}'
}
}
]
Completing task
Task cloyl2gxs0000c3a7hxe6hupc marked as completed.
Updating task
Task cloyl2gxv0001c3a7zi4hqt8z updated with name Go to the gym and priority 1.

That's kinda amazing. All that any of this really does is assemble blobs of text and send them to the OpenAI API, which is able to figure it all out, even with the context of the data, and correctly call both create and update APIs that exist only internally within your system, without exposing anything to the internet at large.

Here it correctly figured out the IDs of the Tasks to update (because I passed that data in with the prompt - it's tiny), which functions to call and that they should be done in parallel, meaning your user can speak/type as much as they like, making a lot of demands in a single submission, and the Assistant will batch it all up into a set of functions that, from its perspective at least, it wants you to run in parallel,

After executing the functions you can send another request to tell the Assistant the outcome - this article is long enough already but you can see how to close that loop on the OpenAI Function Calling docs.

Closing Thoughts

This stuff is all very new, and there are some pros and cons here. While all looks rosy in the end, it did take a few iterations to get GPT to reliably and consistently output the JSON format expected in the translation stage - occasionally it would innovate and restructure things a little, which causes things to break. That's probably just something that time will take care of as this stuff gets polished up, both on OpenAI's end and on everyone else's, but it's something to be aware of.

This technology requires a considered approach to testing too: GPT is a big old black box floating off in the internet somewhere, it's semi-magical, and it doesn't always give the right answer. Bit rot seems a serious risk here - both due to the newness of the tech and the fact that most of us don't really understand it very well. It seems sensible to mock/stub out expected responses from OpenAI's APIs to do unit testing, but when it comes to integration testing, you probably need your tests to do something like what our demo.ts does, and then verify the database was updated correctly at the end.

It can be the case that you make no changes to your code or environment but still get different outcomes due to the non-determinism of GPT. Amelioration for this could be in the form of temperature control and fine tuning, but you're probably going to need to be less than 100% trustful that your Assistant is doing what you think it is.

Finally, there's obviously a huge security consideration here. Fundamentally, we're taking user input (text, speech, images, whatever), and calling code on our own systems as a result. This always involves peril, and one can imagine all kinds of SQL injection-style attacks against Agent systems that inadvertently run malicious actions the developer didn't intend. For example - my API.ts contains a deleteAllTasks function does what you think it does. Because it's part of API.ts, the Assistant knows about it, and could inadvertently call it, whether the user was trying to do that or not.

It would be extremely easy to mix up public and private code in this way and accidentally expose it to the Assistant, so in reality you probably want a sanity-check to run each time the tools JSON has been rebuilt, telling you what changed. Seems a good thing to have in your CI/CD.

Continue reading