Everything tagged typescript (2 posts)

Using Server Actions with Next JS

React and Next.js introduced Server Actions a while back, as a new/old way to call server-side code from the client. In this post, I'll explain what Server Actions are, how they work, and how you can use them in your Next.js applications. We'll look at why they are and are not APIs, why they can make your front end code cleaner, and why they can make your backend code messier.

Everything old is new again

In the beginning, there were <form>s. They had an action, and a method, and when you clicked the submit button, the browser would send a request to the server. The server would then process the request and send back a response, which could be a redirect. The action was the URL of the server endpoint, and the method was usually either GET or POST.

<form action="/submit" method="POST">
<input type="text" name="name" />
<button type="submit">Submit</button>
</form>
<form action="/submit" method="POST">
<input type="text" name="name" />
<button type="submit">Submit</button>
</form>

Then came AJAX, and suddenly we could send requests to the server without reloading the page. This was a game-changer, and it opened up a whole new world of possibilities for building web applications. But it also introduced a lot of complexity, as developers had to manage things like network requests, error handling, and loading states. We ended up building React components like this:

TheOldWay.jsx
//this is just so 2019
export default function CreateDevice() {
const [name, setName] = useState('');
const [loading, setLoading] = useState(false);
const [error, setError] = useState(null);

const handleSubmit = async (e) => {
e.preventDefault();
setLoading(true);
try {
await fetch('/api/devices', {
method: 'POST',
body: JSON.stringify({ name }),
headers: {
'Content-Type': 'application/json',
},
});
} catch (err) {
setError(err);
} finally {
setLoading(false);
}
};

return (
<form onSubmit={handleSubmit}>
<input type="text" value={name} onChange={(e) => setName(e.target.value)} />
<button type="submit" disabled={loading}>Submit</button>
{error && <p>{error.message}</p>}
</form>
);
}
TheOldWay.jsx
//this is just so 2019
export default function CreateDevice() {
const [name, setName] = useState('');
const [loading, setLoading] = useState(false);
const [error, setError] = useState(null);

const handleSubmit = async (e) => {
e.preventDefault();
setLoading(true);
try {
await fetch('/api/devices', {
method: 'POST',
body: JSON.stringify({ name }),
headers: {
'Content-Type': 'application/json',
},
});
} catch (err) {
setError(err);
} finally {
setLoading(false);
}
};

return (
<form onSubmit={handleSubmit}>
<input type="text" value={name} onChange={(e) => setName(e.target.value)} />
<button type="submit" disabled={loading}>Submit</button>
{error && <p>{error.message}</p>}
</form>
);
}

This code is fine, but it's a lot of boilerplate for something as simple as submitting a form. It's also not very readable, as the logic for handling the form submission is mixed in with the UI code. Wouldn't it be nice if we could go back to the good old days of <form>s, but without the page reload?

Enter Server Actions

Now, with Server Actions, React is bringing back the simplicity of the old days, while still taking advantage of the power of modern web technologies. Server Actions allow you to call server-side code from the client, just like you would with a traditional form submission, but without the page reload. It wants you to think that this is all happening without an API on the backend, but this isn't true. It's not magic after all.

Here's how we can write the same form using Server Actions:

app/components/AddDeviceForm.tsx
'use client';
import { useFormState } from 'react-dom';
import { createDeviceAction } from '@/app/actions/devices';

export function AddDeviceForm() {
const [state, formAction] = useFormState(createDeviceAction, {});

return (
<form action={formAction} className="create-device">
<fieldset>
<label htmlFor="name">Name:</label>
<input type="text" name="name" id="name" placeholder="type something" />
<button type="submit">Submit</button>
</fieldset>
{state.status === 'error' && <p className="text-red-500">{state.message}</p>}
{state.status === 'success' && <p className="text-green-500">{state.message}</p>}
</form>
);
}
app/components/AddDeviceForm.tsx
'use client';
import { useFormState } from 'react-dom';
import { createDeviceAction } from '@/app/actions/devices';

export function AddDeviceForm() {
const [state, formAction] = useFormState(createDeviceAction, {});

return (
<form action={formAction} className="create-device">
<fieldset>
<label htmlFor="name">Name:</label>
<input type="text" name="name" id="name" placeholder="type something" />
<button type="submit">Submit</button>
</fieldset>
{state.status === 'error' && <p className="text-red-500">{state.message}</p>}
{state.status === 'success' && <p className="text-green-500">{state.message}</p>}
</form>
);
}

Here's that same AddDeviceForm Component running live in this page. It's a real React component, so try submitting it with and without text in the input field. In both cases it's hitting our createDeviceAction function, which is just a simple function that returns a success or error message based on the input:

One nice thing about this is that the Enter key works on your keyboard without any extra code. This is because the form is a real form, and the submit button is a real submit button. The formAction hook is doing the work of intercepting the form submission and calling the server action instead of the default form submission. It feels more like the old school web.

And here's the actual server action that is being called, in a file called app/actions/devices.ts:

app/actions/devices.ts
'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
const name = formData.get('name');

if (name) {
const device = {
name,
id: Math.round(Math.random() * 10000),
};

return {
status: 'success',
message: `Device '${name}' created with ID: ${device.id}`,
device,
};
} else {
return {
status: 'error',
message: 'Name is required',
};
}
}
app/actions/devices.ts
'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
const name = formData.get('name');

if (name) {
const device = {
name,
id: Math.round(Math.random() * 10000),
};

return {
status: 'success',
message: `Device '${name}' created with ID: ${device.id}`,
device,
};
} else {
return {
status: 'error',
message: 'Name is required',
};
}
}

The code here is simulating a database mutation and doing some basic validation. This all ought to look pretty familiar. Again, this is the actual copy/pasted code actually running behind the scenes.

How does this work?

We didn't set up any API routes, we didn't write any network request code, and we didn't have to handle any loading states or error handling. There is no code I am not showing you, stitching things together. We just wrote a simple form, and the Server Actions library took care of the rest. It's like magic!

But it's not magic. It's HTTP. If you open up your browser's developer tools and submit the form, you'll see a network request being made to the server, just like with a traditional form submission. The only difference is that the request is being intercepted by the Server Actions library and handled by the createDeviceAction function instead of the default form submission handler. This results in a POST request being sent to the current URL, with the form data and a bunch of other stuff being sent along with it.

Form submission network request
The network request that our form made. The actual data we sent is in the 1_name key

Here's what the response looked like:

Form submission network response
We got our data back, plus some other stuff Next.js sends

Next.js has basically created an API endpoint for us, and then provided its own wrapper calls and data structures on both the request and response cycles, leaving us to focus solely on our UI and business logic.

Visual feedback for slower requests

In many cases, the backend may take a few seconds to process the user's request. It's always a good idea to provide some visual feedback to the user while they are waiting. There's another lovely new React hook called useFormStatus that we can use to show a loading spinner while the request is pending. Here's a slightly modified version of the form that shows gives the user some feedback while the request is being processed:

app/components/AddDeviceFormSlow.tsx
'use client';
import { useFormState, useFormStatus } from 'react-dom';
import { createDeviceActionSlow } from '@/app/actions/devices';

export function AddDeviceFormSlow() {
const [state, formAction] = useFormState(createDeviceActionSlow, {});

return (
<form action={formAction} className="create-device">
<fieldset>
<label htmlFor="name">Name:</label>
<input type="text" name="name" id="name" placeholder="type something" />
<SubmitButton />
</fieldset>
{state.status === 'error' && <p className="text-red-500">{state.message}</p>}
{state.status === 'success' && <p className="text-green-500">{state.message}</p>}
</form>
);
}

//this has to be a separate component because we can't use the useFormStatus hook in the
//same component that has the <form>. Sadface.
function SubmitButton() {
const { pending } = useFormStatus();

return (
<button type="submit" disabled={pending}>
{pending ? 'Submitting...' : 'Submit'}
</button>
);
}
app/components/AddDeviceFormSlow.tsx
'use client';
import { useFormState, useFormStatus } from 'react-dom';
import { createDeviceActionSlow } from '@/app/actions/devices';

export function AddDeviceFormSlow() {
const [state, formAction] = useFormState(createDeviceActionSlow, {});

return (
<form action={formAction} className="create-device">
<fieldset>
<label htmlFor="name">Name:</label>
<input type="text" name="name" id="name" placeholder="type something" />
<SubmitButton />
</fieldset>
{state.status === 'error' && <p className="text-red-500">{state.message}</p>}
{state.status === 'success' && <p className="text-green-500">{state.message}</p>}
</form>
);
}

//this has to be a separate component because we can't use the useFormStatus hook in the
//same component that has the <form>. Sadface.
function SubmitButton() {
const { pending } = useFormStatus();

return (
<button type="submit" disabled={pending}>
{pending ? 'Submitting...' : 'Submit'}
</button>
);
}

This is almost identical to the first example, but I've split the submit button into a separate component and used the useFormStatus hook to show a loading spinner when the request is pending. It's also now pointing at the createDeviceActionSlow function, which is identical to the createDeviceAction function except it has a 3 second delay before returning the response.

Here's the live component - give it a whirl:

That's pretty cool. The useFormStatus hook is doing all the work of tracking the request status and updating the UI accordingly. It's a small thing, but it makes both the user experience and the developer experience a lot better.

What about the API?

It has been the case for quite some time that the greatest value in a web application is often not found in its UI but in its API. The UI is just a way to interact with the API, and the API is where the real work gets done. If your application is genuinely useful to other people, there's a good chance they will want to integrate with it via an API.

There is a school of thought that says your UI should be treated just the same as any other API client for your system. This is a good school, and its teachers are worth listening to. UIs are for humans and APIs are for machines, but there's a lot of overlap in what they want in life:

  • A speedy response
  • To know if their action succeeded, or why it failed
  • To get the data they asked for, in a format they can easily consume

Can't we service them both with the same code? Yes, we can. But it's not always as simple as it seems.

The real world spoils the fun

Way up in that second example snippet, we were making a POST request to /api/devices; our UI code was talking to the exact same API endpoint that any other API user would be talking to. There are many obvious benefits to this, mostly centering around the fact that you don't need to maintain parallel code paths for UI and API users. I've worked on systems that did that, and it can end up doubling your codebase.

Server Actions are great, but they take us away from HTTP and REST, which are bedrock technologies for APIs. It's very easy to spam together a bunch of Server Actions for your UI, and then find yourself in a mess when you need to build an API for someone else to use.

The reality is that although API users and UI users do have a lot in common, they also have differences. In our Server Action examples above we were returning a simple object with a status and a message, but in a real API you would likely want to return a more structured response, with an HTTP status code, headers, and a body. We're also much more likely to need things like rate limiting for our API users, which we didn't have to think about for our UI users.

Consider a super simple POST endpoint in a real API. Assume you're using Prisma and Zod for validation - a fairly common pairing. Here's how you might write that API endpoint:

app/api/devices/route.ts
export async function POST(req: NextRequest) {
try {
const body = await req.json();

const data = {
type: body.type,
hostname: body.hostname,
credentials: body.credentials,
} as Prisma.DeviceCreateInput;

DeviceSchema.parse(data);
const device = prisma.device.create({ data });

return NextResponse.json(device, { status: 201 });
} catch (error) {
if (error instanceof ZodError) {
return NextResponse.json({ error: { issues: error.issues } }, { status: 400 });
}
return NextResponse.json({ error: "Failed to create device" }, { status: 500 });
}
}
app/api/devices/route.ts
export async function POST(req: NextRequest) {
try {
const body = await req.json();

const data = {
type: body.type,
hostname: body.hostname,
credentials: body.credentials,
} as Prisma.DeviceCreateInput;

DeviceSchema.parse(data);
const device = prisma.device.create({ data });

return NextResponse.json(device, { status: 201 });
} catch (error) {
if (error instanceof ZodError) {
return NextResponse.json({ error: { issues: error.issues } }, { status: 400 });
}
return NextResponse.json({ error: "Failed to create device" }, { status: 500 });
}
}

This API endpoint consumes JSON input (assume that auth is handled via middleware), validates it with Zod, and then creates a new device in the database. If the input is invalid, it returns a 400 status code with an error message. If the input looks good but there's an error creating the device, it returns a 500 status code with an error message. If everything goes well, it returns a 201 status code with the newly created device.

Now let's see how we might write a Server Action for the same functionality:

app/actions/devices.ts
'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
try {
const data = {
type: formData.get("type"),
hostname: formData.get("hostname"),
credentials: formData.get("credentials"),
} as Prisma.DeviceCreateInput;

DeviceSchema.parse(data);
const device = prisma.device.create({ data });

revalidatePath("/devices");

return {
success: true,
message: "Device Created Successfully",
device,
};
} catch (error) {
if (error instanceof ZodError) {
return {
success: false,
message: "Validation Error",
error: {
issues: error.issues,
},
};
}

return {
success: false,
message: "Failed to create device",
error: JSON.stringify(error),
};
}
}
app/actions/devices.ts
'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
try {
const data = {
type: formData.get("type"),
hostname: formData.get("hostname"),
credentials: formData.get("credentials"),
} as Prisma.DeviceCreateInput;

DeviceSchema.parse(data);
const device = prisma.device.create({ data });

revalidatePath("/devices");

return {
success: true,
message: "Device Created Successfully",
device,
};
} catch (error) {
if (error instanceof ZodError) {
return {
success: false,
message: "Validation Error",
error: {
issues: error.issues,
},
};
}

return {
success: false,
message: "Failed to create device",
error: JSON.stringify(error),
};
}
}

The core of these 2 functions is the same exact 2 lines - one to validate using zod, the other to persist using Prisma. The flow is exactly the same, but in one case we're grabbing JSON, in the other reading form data. In one case we're returning NextResponse objects with HTTP status codes, in the other we're returning objects with success and message keys. The Server Action can also take advantage of nice things like revalidatePath to trigger a revalidation of the page that called it, but we don't want that line in our API endpoint.

Somewhere along the line we will want to show a message to the UI user telling them what happened - hence the message key in the Server Action (the API user can just read the HTTP status code). We could have moved that logic to the UI instead, perhaps returning a statusCode key in the JSON response to emulate an HTTP status code. But that's just reimplementing part of HTTP, and moving the problem to the client, which now has to provide the mapping from a status code to a message. It also means a bigger bundle if we want to support internationalization for those messages.

What this all means is that if you want to take advantage of the UI code cleanliness benefits that come from using Server Actions, and your application conceivably might need an API now or in the future, you need to think about how you are going to avoid duplicating logic between your Server Actions and your API endpoints. This may be a hard problem, and there's no one-size-fits-all solution. Yes you can pull those 2 lines of core logic out into a shared function, but you're still left with a lot of other almost-the-same-but-not-quite code.

Ultimately, it probably just requires another layer of indirection. What that layer looks like will depend on your application, but it's something to think about before you go all-in on Server Actions.

Continue reading

Using ChatGPT to generate ChatGPT Assistants

OpenAI dropped a ton of cool stuff in their Dev Day presentations, including some updates to function calling. There are a few function-call-like things that currently exist within the Open AI ecosystem, so let's take a moment to disambiguate:

  • Plugins: introduced in March 2023, allowed GPT to understand and call your HTTP APIs
  • Actions: an evolution of Plugins, makes it easier but still calls your HTTP APIs
  • Function Calling: Chat GPT understands your functions, tells you how to call them, but does not actually call them

It seems like Plugins are likely to be superseded by Actions, so we end up with 2 ways to have GPT call your functions - Actions for automatically calling HTTP APIs, Function Calling for indirectly calling anything else. We could call this Guided Invocation - despite the name it doesn't actually call the function, it just tells you how to.

That second category of calls is going to include anything that isn't an HTTP endpoint, so gives you a lot of flexibility to call internal APIs that never learned how to speak HTTP. Think legacy systems, private APIs that you don't want to expose to the internet, and other places where this can act as a highly adaptable glue.

I've put all the source code for this article up at https://github.com/edspencer/gpt-functions-example, so check that out if you want to follow along. It should just be a matter of following the steps in the README, but YMMV. We are, of course, going to use a task management app as a playground.

Creating Function definitions

In order for OpenAI Assistants to be able to call your code, you need to provide them with signatures for all of your functions, in the format that it wants, which look like this:

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
}
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
}

That's pretty self-explanatory. It's also a pain in the ass to keep tweaking and updating as you evolve your app, so let's use the OpenAI Chat Completions API with the json_object setting enabled and see if we can have this done for us.

Our Internal API

Let's build a basic Task management app. We'll just use a super-naive implementation of Todos written in TypeScript. My little API.ts has functions like addTask, updateTask, removeTask, getTasks, etc. All the stuff you'd expect. Some of them take a bunch of different inputs.

Here's a snippet of our API.ts file. It's very basic but functional, using a sqlite database driven by Prisma:

interface TaskInput {
name: string;
priority?: number;
completed?: boolean;
deleted?: boolean;
}

/**
* Adds a new task to the database.
* @param taskInput - An object containing the details of the task to be added.
* @param taskInput.name - The name of the task.
* @param taskInput.priority - The priority of the task.
* @returns A Promise that resolves when the task has been added to the database.
*/
async function addTask(taskInput: Task): Promise<Task | void> {
try {
const task = await prisma.task.create({
data: taskInput
})
console.log(`Task ${task.id} created with name ${task.name} and priority ${task.priority}.`)

return task;
} catch (e) {
console.error(e)
}
}

/**
* Updates a task in the database.
* @param id - The ID of the task to update.
* @param updates - An object containing the updates to apply to the task.
* @param updates.name - The updated name of the task.
* @param updates.priority - The updated priority of the task.
* @param updates.completed - The updated completed status of the task.
* @returns A Promise that resolves when the task has been updated in the database.
*/
async function updateTask(id: string, updates: Partial<TaskInput>): Promise<void> {
try {
const task = await prisma.task.update({
where: { id },
data: updates,
})
console.log(`Task ${task.id} updated with name ${task.name} and priority ${task.priority}.`)
} catch (e) {
console.error(e)
}
}
interface TaskInput {
name: string;
priority?: number;
completed?: boolean;
deleted?: boolean;
}

/**
* Adds a new task to the database.
* @param taskInput - An object containing the details of the task to be added.
* @param taskInput.name - The name of the task.
* @param taskInput.priority - The priority of the task.
* @returns A Promise that resolves when the task has been added to the database.
*/
async function addTask(taskInput: Task): Promise<Task | void> {
try {
const task = await prisma.task.create({
data: taskInput
})
console.log(`Task ${task.id} created with name ${task.name} and priority ${task.priority}.`)

return task;
} catch (e) {
console.error(e)
}
}

/**
* Updates a task in the database.
* @param id - The ID of the task to update.
* @param updates - An object containing the updates to apply to the task.
* @param updates.name - The updated name of the task.
* @param updates.priority - The updated priority of the task.
* @param updates.completed - The updated completed status of the task.
* @returns A Promise that resolves when the task has been updated in the database.
*/
async function updateTask(id: string, updates: Partial<TaskInput>): Promise<void> {
try {
const task = await prisma.task.update({
where: { id },
data: updates,
})
console.log(`Task ${task.id} updated with name ${task.name} and priority ${task.priority}.`)
} catch (e) {
console.error(e)
}
}

It goes on from there. You get the picture. No it's not production-grade code - don't use this as a launchpad for your Todo list manager app. GitHub Copilot actually wrote most of that code (and most of the documentation) for me.

Side note on documentation: it took me more years than I care to admit to figure out that the primary consumer of source code is humans, not machines. The machine doesn't care about your language, formatting, awfulness of your algorithms, weird variable names, etc; algorithmic complexity aside it'll do exactly the same thing regardless of how you craft your code. Humans are a different matter though, and benefit enormously from a little context written in a human language.

Ironically, that same documentation that benefitted human code consumers all this time is now what enables these new machine consumers to grok and invoke your code, saving you the work of coming up with a translation layer to integrate with AI agents. So writing documentation really does help you after all. Also, write tests and eat your vegetables.

Generating the OpenAI translation layer

The code to translate our internal API into something OpenAI can use is fairly simple and reusable. All we do is read in a file as text, stuff the contents of that file into a GPT prompt, send that off to OpenAI, stream the results back to the terminal and save it to a file when done:

/**
* This file uses the OpenAI Chat Completions API to automatically generate OpenAI Function Call
* JSON objects for an arbitrary code file. It takes a source file, reads it and passes it into
* OpenAI with a simple prompt, then writes the output to another file. Extend as needed.
*/

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

import { OptionValues, program } from 'commander';

//takes an input file, and generates a new tools.json file based on the input file
program.option('sourceFile', 'The source file to use for the prompt', './API.ts');
program.option('outputFile', 'The output file to write the tools.json to (defaults to your input + .tools.json');

const openai = new OpenAI();

/**
* Takes an input file, and generates a new tools.json file based on the input file.
* @param sourceFile - The source file to use for the prompt.
* @param outputFile - The output file to write the tools.json to. Defaults to
* @returns Promise<void>
*/
async function build({ sourceFile, outputFile = `${sourceFile}.tools.json` }: OptionValues) {
console.log(`Reading ${sourceFile}...`);
const sourceFileText = fs.readFileSync(path.join(__dirname, sourceFile), 'utf-8');

const prompt = `
This is the implementation of my ${sourceFile} file:

${sourceFileText}

Please give me a JSON object that contains a single key called "tools", which is an array of the functions in this file.
This is an example of what I expect (one element of the array):

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, with lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
},

`
//Call the OpenAI API to generate the function definition, and stream the results back
const stream = await openai.chat.completions.create({
model: 'gpt-4-1106-preview',
response_format: { type: 'json_object' },
messages: [{ role: 'user', content: prompt }],
stream: true,
});

//Keep the new tools.json in memory until we have it all
let newToolsJson = "";

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
process.stdout.write(content);
newToolsJson += content;
}

console.log(`Updating ${outputFile}...}`);

// Write the tools JSON to ../tools.json
fs.writeFileSync(path.join(__dirname, outputFile), newToolsJson);
}

build(program.parse(process.argv).opts());
/**
* This file uses the OpenAI Chat Completions API to automatically generate OpenAI Function Call
* JSON objects for an arbitrary code file. It takes a source file, reads it and passes it into
* OpenAI with a simple prompt, then writes the output to another file. Extend as needed.
*/

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

import { OptionValues, program } from 'commander';

//takes an input file, and generates a new tools.json file based on the input file
program.option('sourceFile', 'The source file to use for the prompt', './API.ts');
program.option('outputFile', 'The output file to write the tools.json to (defaults to your input + .tools.json');

const openai = new OpenAI();

/**
* Takes an input file, and generates a new tools.json file based on the input file.
* @param sourceFile - The source file to use for the prompt.
* @param outputFile - The output file to write the tools.json to. Defaults to
* @returns Promise<void>
*/
async function build({ sourceFile, outputFile = `${sourceFile}.tools.json` }: OptionValues) {
console.log(`Reading ${sourceFile}...`);
const sourceFileText = fs.readFileSync(path.join(__dirname, sourceFile), 'utf-8');

const prompt = `
This is the implementation of my ${sourceFile} file:

${sourceFileText}

Please give me a JSON object that contains a single key called "tools", which is an array of the functions in this file.
This is an example of what I expect (one element of the array):

{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the task."
},
"priority": {
"type": "number",
"description": "The priority of the task, with lower numbers indicating higher priority."
},
"completed": {
"type": "boolean",
"description": "Whether the task is marked as completed."
}
},
"required": ["name"]
}
}
},

`
//Call the OpenAI API to generate the function definition, and stream the results back
const stream = await openai.chat.completions.create({
model: 'gpt-4-1106-preview',
response_format: { type: 'json_object' },
messages: [{ role: 'user', content: prompt }],
stream: true,
});

//Keep the new tools.json in memory until we have it all
let newToolsJson = "";

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
process.stdout.write(content);
newToolsJson += content;
}

console.log(`Updating ${outputFile}...}`);

// Write the tools JSON to ../tools.json
fs.writeFileSync(path.join(__dirname, outputFile), newToolsJson);
}

build(program.parse(process.argv).opts());

I've made a simple little repo with this file, the API.ts file, and a little demo that shows it all integrated. Run it like this:

ts-node rebuildTools.ts -s API.ts
ts-node rebuildTools.ts -s API.ts

Which will give you some output like this, and then update your API.ts.tools.json file:

ts-node rebuildTools.ts -s API.ts
Reading API.ts...
{
"tools": [
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {

..........truncated...
full output at https://github.com/edspencer/gpt-functions-example/blob/main/API.ts.tools.json
.............................

"returns": {
"type": "Promise<void>",
"description": "A Promise that resolves when all tasks have been deleted from the database."
}
}
}
]
}
Updating ./API.ts.tools.json...
Done
ts-node rebuildTools.ts -s API.ts
Reading API.ts...
{
"tools": [
{
"type": "function",
"function": {
"name": "addTask",
"description": "Adds a new task to the database.",
"parameters": {
"type": "object",
"properties": {
"name": {

..........truncated...
full output at https://github.com/edspencer/gpt-functions-example/blob/main/API.ts.tools.json
.............................

"returns": {
"type": "Promise<void>",
"description": "A Promise that resolves when all tasks have been deleted from the database."
}
}
}
]
}
Updating ./API.ts.tools.json...
Done

Creating an OpenAI Assistant and talking to it

We've had Open AI generate our Tools JSON file, now let's see if it can use it with a simple demo.ts, which:

The code is all up on GitHub, and I won't do a blow-by-blow here but let's have a look at the output when we run it:

ts-node ./demo.ts -m "I need to go buy bread from the store, then go to \
the gym. I also need to do my taxes, which is a P1."
ts-node ./demo.ts -m "I need to go buy bread from the store, then go to \
the gym. I also need to do my taxes, which is a P1."

And the output:

Creating assistant...
Created assistant asst_hkT3BFQsNf3HSmJpE8KytiX9 with name Task Planner.
Created thread thread_AigYi0oFrytu3aO5k0mRacIV
Retrieved 0 tasks from the database.
Created message
msg_uLpR3UpQB3pX62wVIA7TcqIl
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_8JX5ffKFpxIhYmJeZYYilpv3',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Buy bread from the store", "priority": 2}'
}
},
{
id: 'call_GC4axxSB6Oso0tiolDLr900X',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Go to the gym", "priority": 2}'
}
},
{
id: 'call_7c5mWt1I5Ff3h5Lvb0Hfw2L7',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Do taxes", "priority": 1}'
}
}
]
Adding task
Task cloyl2gxs0000c3a7hxe6hupc created with name Buy bread from the store and priority 2.
Adding task
Task cloyl2gxv0001c3a7zi4hqt8z created with name Go to the gym and priority 2.
Adding task
Task cloyl2gxx0002c3a7l0gv7f07 created with name Do taxes and priority 1.
Creating assistant...
Created assistant asst_hkT3BFQsNf3HSmJpE8KytiX9 with name Task Planner.
Created thread thread_AigYi0oFrytu3aO5k0mRacIV
Retrieved 0 tasks from the database.
Created message
msg_uLpR3UpQB3pX62wVIA7TcqIl
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_8JX5ffKFpxIhYmJeZYYilpv3',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Buy bread from the store", "priority": 2}'
}
},
{
id: 'call_GC4axxSB6Oso0tiolDLr900X',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Go to the gym", "priority": 2}'
}
},
{
id: 'call_7c5mWt1I5Ff3h5Lvb0Hfw2L7',
type: 'function',
function: {
name: 'addTask',
arguments: '{"name": "Do taxes", "priority": 1}'
}
}
]
Adding task
Task cloyl2gxs0000c3a7hxe6hupc created with name Buy bread from the store and priority 2.
Adding task
Task cloyl2gxv0001c3a7zi4hqt8z created with name Go to the gym and priority 2.
Adding task
Task cloyl2gxx0002c3a7l0gv7f07 created with name Do taxes and priority 1.

You can see all of the steps it takes in the console output. We had the creation of the Assistant, the Thread, then we looked to see if our sqlite database has any existing Tasks, in which case we're going to send those along as input too, then we pass those along with the user's message and get back OpenAI's function invocations (3 in this case). Finally, we iterate over them all and call our internal addTask function, and at the bottom of the output we see that our tasks were created successfully.

Let's go call it again, updating the tasks that we just made:

ts-node demo.ts -m "I finished the laundry, please mark it complete. Also the gym is a P1"
ts-node demo.ts -m "I finished the laundry, please mark it complete. Also the gym is a P1"

Output:

Creating assistant...
Created assistant asst_WbTXKoXWL1yTWs4zvcVkDIDT with name Task Planner.
Created thread thread_mLvr7acahXbnmoe217f0gMRF
Retrieved 3 tasks from the database.
Created message
msg_iYYkAeuxRPNmJZ5vAKwiI8S7
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_W4UKGadROhaJJFZym7vQocP7',
type: 'function',
function: {
name: 'completeTask',
arguments: '{"id": "cloyl2gxs0000c3a7hxe6hupc"}'
}
},
{
id: 'call_KzaYk1x4sIRFWeKlvgOk37qf',
type: 'function',
function: {
name: 'updateTask',
arguments: '{"id": "cloyl2gxv0001c3a7zi4hqt8z", "updates": {"priority": 1}}'
}
}
]
Completing task
Task cloyl2gxs0000c3a7hxe6hupc marked as completed.
Updating task
Task cloyl2gxv0001c3a7zi4hqt8z updated with name Go to the gym and priority 1.
Creating assistant...
Created assistant asst_WbTXKoXWL1yTWs4zvcVkDIDT with name Task Planner.
Created thread thread_mLvr7acahXbnmoe217f0gMRF
Retrieved 3 tasks from the database.
Created message
msg_iYYkAeuxRPNmJZ5vAKwiI8S7
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
{
id: 'call_W4UKGadROhaJJFZym7vQocP7',
type: 'function',
function: {
name: 'completeTask',
arguments: '{"id": "cloyl2gxs0000c3a7hxe6hupc"}'
}
},
{
id: 'call_KzaYk1x4sIRFWeKlvgOk37qf',
type: 'function',
function: {
name: 'updateTask',
arguments: '{"id": "cloyl2gxv0001c3a7zi4hqt8z", "updates": {"priority": 1}}'
}
}
]
Completing task
Task cloyl2gxs0000c3a7hxe6hupc marked as completed.
Updating task
Task cloyl2gxv0001c3a7zi4hqt8z updated with name Go to the gym and priority 1.

That's kinda amazing. All that any of this really does is assemble blobs of text and send them to the OpenAI API, which is able to figure it all out, even with the context of the data, and correctly call both create and update APIs that exist only internally within your system, without exposing anything to the internet at large.

Here it correctly figured out the IDs of the Tasks to update (because I passed that data in with the prompt - it's tiny), which functions to call and that they should be done in parallel, meaning your user can speak/type as much as they like, making a lot of demands in a single submission, and the Assistant will batch it all up into a set of functions that, from its perspective at least, it wants you to run in parallel,

After executing the functions you can send another request to tell the Assistant the outcome - this article is long enough already but you can see how to close that loop on the OpenAI Function Calling docs.

Closing Thoughts

This stuff is all very new, and there are some pros and cons here. While all looks rosy in the end, it did take a few iterations to get GPT to reliably and consistently output the JSON format expected in the translation stage - occasionally it would innovate and restructure things a little, which causes things to break. That's probably just something that time will take care of as this stuff gets polished up, both on OpenAI's end and on everyone else's, but it's something to be aware of.

This technology requires a considered approach to testing too: GPT is a big old black box floating off in the internet somewhere, it's semi-magical, and it doesn't always give the right answer. Bit rot seems a serious risk here - both due to the newness of the tech and the fact that most of us don't really understand it very well. It seems sensible to mock/stub out expected responses from OpenAI's APIs to do unit testing, but when it comes to integration testing, you probably need your tests to do something like what our demo.ts does, and then verify the database was updated correctly at the end.

It can be the case that you make no changes to your code or environment but still get different outcomes due to the non-determinism of GPT. Amelioration for this could be in the form of temperature control and fine tuning, but you're probably going to need to be less than 100% trustful that your Assistant is doing what you think it is.

Finally, there's obviously a huge security consideration here. Fundamentally, we're taking user input (text, speech, images, whatever), and calling code on our own systems as a result. This always involves peril, and one can imagine all kinds of SQL injection-style attacks against Agent systems that inadvertently run malicious actions the developer didn't intend. For example - my API.ts contains a deleteAllTasks function does what you think it does. Because it's part of API.ts, the Assistant knows about it, and could inadvertently call it, whether the user was trying to do that or not.

It would be extremely easy to mix up public and private code in this way and accidentally expose it to the Assistant, so in reality you probably want a sanity-check to run each time the tools JSON has been rebuilt, telling you what changed. Seems a good thing to have in your CI/CD.

Continue reading