Building an AI camera, with an agent showing the way

In Brief

ImageGenCam is a kit from OpenAI: a digital camera you build yourself that sends your photos through AI and transforms them as you shoot.
You don't need to code. An AI agent named Codex builds and sets up everything, and you steer it with ordinary words in your own language.
Three layers use the same trick: the image recipes, the Magic button, and the agent's own house rules are all written in plain text, not in code.
The real point isn't the camera, but that language has become the interface: you can build something real by describing it in words.

First: bringing the project home

It all starts with a link to a repo. Not technical? Think of a repo (short for "repository") as a folder of files that lives online, with a service called GitHub.

You don't need to download anything manually. You simply ask Codex, OpenAI's AI assistant, to fetch the project, and it handles the rest. What it does under the hood is called cloning: it pulls down a copy that still stays connected to the original. That connection is what lets you get updates later if OpenAI makes changes.

What is ImageGenCam?

ImageGenCam isn't an app you just install. It's a kit for a physical camera.

All the parts laid out: 3D-printed camera body, Raspberry Pi Zero, PiSugar battery, screen with four buttons, camera and micro-SD card

Here's each part explained:

PiSugar battery: a rechargeable battery, so the camera can be used without being plugged in.
Raspberry Pi Zero: a tiny computer, about the size of a matchbox. This is the brain that runs everything.
LCD screen with four buttons: the camera's screen, the same flat-screen tech as in phones and TVs. It's both the viewfinder (what you see before you shoot) and the menu you control with the buttons.
Micro-SD card: the storage. This holds both the program that makes the camera work and the photos you take.
Raspberry Pi camera: the "eye" itself, the little lens that captures the image.
3D-printed camera body: the shell that holds all the parts together and protected, the one you print yourself.

You assemble the parts shown in the image above and dress them in a body you print yourself. When you take a photo, it's sent to OpenAI's image model, which transforms it according to a recipe you've chosen. The built-in recipes are wildly different: one turns everyone in the photo into cheese, one redraws it as a clumsy doodle made with the mouse, one turns the person into a colorful anime figure, and one turns you into a little goblin. What your camera does is entirely up to you.

Before and after: an ordinary car on the left, the same car transformed into cheese on the right

The recipes are only a starting point. A recipe isn't code, just a short piece of text that tells the AI what the image should become. You can change them, write your own, and fine-tune until the camera feels like yours. A little app on your phone connects to the camera, so you can download the photos and switch recipes right from your phone.

What do you need to get started? The parts in the image above, a Mac with Codex Desktop installed, stable Wi-Fi and an OpenAI account. If you already use ChatGPT, you already have an account.

The most exciting part is the way the project is meant to be built. The entire build and setup process is designed to be driven by an AI agent. You just write "help me make ImageGenCam" to the agent, and it reads the project and guides you step by step through the whole assembly and setup. That's why the project has two different "read me" files: one written for humans, and one written for the AI agent that does the work. It's a brand-new way to hand a project to someone else, and you'll meet it again further down.

Codex Desktop on a Mac, with the prompt "Help me make ImageGenCam" ready to send

Help me make ImageGenCam https://github.com/openai/imagegencam

What is Codex? It's an AI assistant from OpenAI, the same company that makes ChatGPT. You use it in almost the same way: open a chat window and write what you want done, in your own language. The difference is that while ChatGPT replies with text, Codex can actually carry out tasks on your machine. It reads the project, sets things up and fixes errors along the way. That's why it can build this camera with you, step by step, instead of just explaining how.

The map of the project

When you open the folder for the first time, the list of files can look overwhelming. But this is something you strictly don't need to worry about: Codex handles all the technical parts. If you'd still like to know how things fit together, here's the whole list explained:

imagegencam/
├── README.md          "Read me" for humans: parts list, buttons, recipes
├── AGENTS.md          "Read me" for the AI agent that builds the project
├── 3d model/          The camera body itself, ready for a 3D printer
├── software/          The code that runs inside the camera
│   ├── src/           The actual brains of the code ("src" = source code)
│   ├── scripts/       Small helper programs that run on the camera
│   ├── deploy/        Tools for getting the code onto the camera
│   ├── data/          Ready-made image recipes and more
│   ├── tests/         Automated checks that verify the code works
│   └── ARCHITECTURE.md  An explanation of how the code fits together
├── docs/              Guides, including the ones the agent follows step by step
├── scripts/           Helper programs that run on a Mac, not on the camera
├── assets/            Images used in the documentation
├── LICENSE / NOTICE   The legal part: permission to use and change this freely
└── SECURITY.md        Procedures if you find a security problem

That's the whole overview. The project is home on your computer, you know what it is, and the map is in place. Next you go inside the brain itself: how the code inside the camera is built.

Inside the brain: the code that makes the camera work

On the map, the folder software/src/ got the name "the actual brains of the code". Inside it are seven small files, all written in the programming language Python, which is why the names end in .py. The code follows two simple rules. The first: let the camera do as little as possible at a time, because the Raspberry Pi is a weak little computer and is easily overloaded. The second: a photo you've taken should never be lost, even if the network drops or the battery dies.

Three of the files do most of the work.

The first is the starter (app.py). It does its job the moment you turn the camera on: it wakes up all the parts and connects them, the camera, the screen, the AI and the phone app. The file itself is tiny, only around 70 lines, because all it should do is connect things together, not do all the work itself.

The second is the heart (controller.py). This is the largest file, and it works without pause for as long as the camera is on. Over and over it does the same thing: it pulls in what the lens sees many times a second and shows it on the screen as a live viewfinder, watches for you pressing a button, takes the photo when you click, and keeps track of the album and how much battery you have left. Everything you experience as the camera itself happens in here.

The third is the phone app (web.py). The camera builds its own little website all by itself, a kind of remote control you open on your phone. If your phone is on the same Wi-Fi, you can open that page and download photos, switch recipes and see what the camera sees right now. The page exists only on your own network, not out on the open internet. That's deliberate, for security reasons.

But what if you want to go for a walk with the camera, far from your home Wi-Fi? Then you turn on tethering (a hotspot) on your phone, and it creates its own little Wi-Fi network the camera connects to. The phone covers both needs at once: mobile data gives the camera internet, so the AI works, and the hotspot is the shared network, so the phone app can still find the camera.

The last four are helpers that each do one thing:

The AI chat (openai_client.py): all contact with OpenAI gathered in one place. It sends your photo out to OpenAI, where it's transformed, and fetches the finished one back.
The memory (config.py): remembers your recipes and settings, so the camera is the same the next time you turn it on.
The queue (job_store.py): a little "to-do list" on the memory card of photos waiting to be transformed.
The Wi-Fi manager (wifi_manager.py): lets you try a new Wi-Fi network safely from the camera: if it doesn't work, the camera connects itself back to the old one.

Worth knowing:

Worth knowing: What happens if the network drops or the battery dies right as the AI is working? The camera always saves your photo to the memory card first, before it's sent to OpenAI. The transformation job is also queued on the memory card, not just in the camera's temporary memory. The difference matters: the memory card remembers even if the power goes, while the temporary memory is wiped the same instant. If the power cuts out, the job is picked up again and retried when the camera starts. That's why you can keep shooting while the previous photo is still being transformed in the background, and nothing is lost along the way.

That's the whole brain: one file that starts everything, one that is the camera itself, one for the phone, and a few small helpers around them. Next you'll see how you actually use the camera in your hand: the buttons, the viewfinder and the album.

The camera in your hand: buttons, viewfinder and album

Once the camera is fully assembled, it might look a little odd, but in use it feels like an ordinary point-and-shoot: you aim, press the shutter, and the photo is taken. What's special is what happens afterward, when the AI transforms the image. First, the buttons.

ImageGenCam seen from the front with all the buttons labeled: Magic button and charging port at top left, shutter/on-off at top right, two function buttons on the left side, and up/down on the right side

The most important button is at the top right: the shutter, which is also the on/off button. A short press takes the photo. A long press turns the camera off. To turn it back on there's a little rhythm of its own: short press, release, long press, release.

Worth knowing:

Worth knowing: Why such an awkward combination just to turn it on? It's deliberate. A single on button would easily get pushed in inside a pocket or bag, and then the camera would wake up on its own and drain the battery. The little rhythm acts like a lock, much like the pattern you tap to wake a phone. The camera only starts when you actually mean it.

On the left side are two buttons, marked "Function 1" and "Function 2" on the diagram. The neutral names are a point in themselves: the buttons aren't locked to one task. By default the top one opens the recipe menu: here you choose which transformation the photo should get, cheese, goblin, anime or whatever you've added. The bottom one opens the album, where all your photos live. But you can ask Codex to give them a completely different function later.

The buttons on the right side, up and down, you use to scroll. When you're in the recipe menu or the album, they move the cursor up and down the list.

All on its own is the Magic button at the top left. Think of it as the camera's wildcard. It has no fixed job: you decide yourself what it should do, and ask Codex to set it up the way you want. More on it further down, when you write your own recipes.

Now for the moment itself. You aim, and press the shutter. The viewfinder freezes for a brief moment, like a still-image receipt that the photo was captured, before it shows the normal camera view again. The transformation itself happens in the background. You don't have to wait: you can keep shooting right away, while the previous photo is still with the AI. That's the little queue on the memory card at work. When a transformed photo is ready, the album icon sparkles, a little signal that something new is waiting for you in there.

If you want the photos on your phone, you connect to the phone app, the remote control. The camera shows a QR code you scan to open it, as long as the phone is on the same Wi-Fi.

The recipe menu gives you the built-in transformations, but they're only a starting point. Next you'll see how to write your own.

Write your own recipes

A recipe sounds advanced, but it's the opposite. It's just an ordinary message to the AI, written in plain text. No code, no settings to fiddle with. You tell it in words what the image should become, and the AI does its best. Maybe you've heard the word "prompt". It means exactly the same thing: an instruction written in words.

The built-in recipes are written in English, but they're nothing more mysterious than ordinary sentences. You can just as easily write your own in the language you speak. The camera comes with four of them. The two extremes show the full range, from a single line to a whole paragraph:

The "cheese" recipe is just a single sentence: keep everything in the photo, except everyone is turned into cheese. Because it says so little about how, the AI fills in the rest itself: sometimes it becomes melted cheese, other times a firm yellow cheese with holes. You've decided the direction, but left the details to the camera. If you always want a firm yellow cheese with holes, it has to say so in the recipe.

"Goblin" at the bottom is the opposite extreme. It fills a whole paragraph and pokes at every detail: big pointed ears, oversized yellow eyes, tiny fangs, colors, linework and background. Where "cheese" left most of it open, "goblin" nails down almost everything.

That's the whole point. The more precisely you describe what you want, the more you decide how the result turns out. Say little, and the AI improvises. Say a lot, and you keep it on a tight rein.

But where do you write them? You don't need to open the code or connect anything extra. It all happens in the phone app. You open it with the QR code from the camera, as before. Inside the app you find the recipes.

Each recipe sits there as a little card with two fields: a title and the text itself. To change one, you type right in the text field. To make a new one from scratch, you press "Add" and fill in a title and an instruction. And to get rid of one you never use, you remove it. The only requirement is that at least one recipe always remains, so the camera has something to transform the photos into.

You don't need to press save. The changes stick by themselves as you type, much like correcting a note in a notes app. The next time you shoot with that recipe, the new text is what applies.

Remember the cheese that came out as random melted cheese one time and yellow cheese the next? Now you can decide for yourself. Open "cheese" and swap the sentence for something more precise, for example a firm yellow cheese with big holes. From then on the AI doesn't guess. It does as you say.

But you don't have to come up with everything yourself. The camera can also create a recipe for you, entirely on its own. That's where the Magic button comes in.

The Magic button: let the camera make things up

You already met it among the buttons: the wildcard at the top left, with a promise of more later. Here it is.

So far you've written the recipes yourself, either by changing the built-in ones or making up your own. The Magic button flips that around: now the camera writes the recipe, and you see what it came up with.

It happens in two steps. First you aim at something and press Magic. The camera takes a quick look at what you're pointing at. Then it asks the AI to find one thing that stands out: a color, an object, a pose, anything odd enough to build an idea around. The AI writes a fresh recipe and gives it a short name of a couple of words. It's like a friend who looks around, fixes on one strange detail, and says "let's make everything look like that".

Then comes step two: you shoot. The recipe the camera just made up is stamped onto your photo the usual way. You don't know in advance what you'll get, and that's the whole point. The Magic button is the camera's way of surprising you.

Some of the inventions are hits, others misses. That's why the camera remembers them. Every recipe Magic makes ends up in its own history, so you can go back and see what it came up with. If you especially like one, you can promote it to the permanent menu, where it stands side by side with cheese and goblin. A random snapshot becomes a recipe you own.

And if you want something completely different, the button is still a wildcard. It doesn't have to make up recipes at all. You can ask Codex to give it a completely different job. Next you'll see how far you can take it.

Rebuild it however you like

Recipes and the Magic button changed the images the camera makes. But the camera itself isn't finished because of that. The buttons, the look of the screen, the way it starts up: none of it is set in stone. Think of the camera as clay you can reshape again and again, not a sealed gadget you never get to open. The same Codex that brought the project home and built it can rebuild any part.

You still don't need to code yourself. You write what you want, Codex changes the code. You can ask it to swap the startup screen, that is, the image shown the moment the camera turns on. You can change the entire look, both on the camera's own screen and in the app on your phone (what's called the UI in tech terms, the user interface: everything you see and tap). You can give the Magic button a fixed job, so the wildcard becomes something predictable. Or you can come up with something entirely your own that no one has thought of yet.

Even the body around the electronics can be reshaped. The 3D-printed case comes along as a .step file, a kind of digital blueprint you can open in a modeling program. If you know a bit of 3D modeling, you make new shapes and details yourself. If you'd rather decorate the outside or design a completely new case, that's allowed too. The whole camera is yours to shape, both inside and out.

Make it useful, make it weird, make it beautiful, make it your own. But before you can rebuild it, it actually has to be built. Next you'll see how the parts become a real camera you can hold in your hand.

From parts to camera

You've already seen the parts, and you know what the camera should be able to do. Now it gets assembled. This is a weekend project, and if you're used to Codex, the Raspberry Pi and 3D printing already, you can be done in under an hour. Here too you don't have to fumble alone: when you ask Codex for help, it takes you through the assembly one step at a time, and makes sure the parts go on in the right order.

The assembly itself is four short steps.

1. The memory card goes in

The little memory card carries both the operating system (the base program the computer starts on when it's powered up) and the space for your photos. It slides into a slot of its own on the Raspberry Pi. If the memory card is already in place in a ready-made kit, you skip this.

2. The camera cable

The camera connects to the computer via a thin, flat cable, a so-called ribbon cable. On the Raspberry Pi there's a little connector with a dark flap. Lift the flap gently, slide the cable straight in, and press the flap back down. The shiny metal stripes on the cable should face the contacts inside the slot. If it doesn't go in, don't force it: check instead that it's facing the right way.

3. The battery

The PiSugar is battery and power management in one. It clips onto the back of the Raspberry Pi together with the battery cell itself. It's what lets the camera run without hanging off a cord.

4. The screen

Finally, the screen, the one with the four buttons, is pressed down onto the rows of small metal pins sticking up from the Raspberry Pi. The pins are called "headers": they hold the screen in place and give it a connection to the computer. Once it's seated, all the electronics are fully connected.

Then all that's left is to bring it to life. Plug USB-C power into the PiSugar. To turn the camera on you use the on/off button on the PiSugar, the same button that becomes the shutter once everything is in the case: a short press, release, then a long press of around eight seconds, and release. The screen wakes up with a startup screen, and right after a Wi-Fi picker appears, where you connect the camera to the same network as your machine. What starts up now is just the base system, not the finished camera.

Once the screen lights up and the camera is online, the physical job is done. The 3D-printed case is the shell you finally place the electronics into, and you can do that whenever, before or after the first boot. But something still has to turn it into a working camera: fetch the program, connect to OpenAI and start the whole thing. Next you'll see how the AI agent does exactly that, without you writing a single line of code.

The camera that came with its own teacher

Right at the start it said the project has two read-me files: one written for humans, and one written for the AI agent. You've used the first one the whole way. Now it's the other one's turn, because that's where one of the most unusual things about ImageGenCam lies.

That file is called AGENTS.md. The extension .md just means it's a text file, ordinary writing you can read straight off. But the content is unusual: it doesn't describe what the camera consists of or how it's built, but how the agent itself should behave while it helps you. A set of house rules, not a build plan. The agent reads them quietly.

The tone is set deliberately. Here's how the file opens, verbatim:

# ImageGenCam Codex Guide

## Operating Style
- Act like the tutorial guide, not like a docs search engine.
- Guide one step at a time.
- Before each command, say what it does and why it happens now.
- After each step, say what success looks like.
- If a step fails, stop and diagnose before continuing.
- Do not ask the user to paste API keys, Pi passwords, or Codex auth tokens into chat.
...

That's the whole manual in miniature: ordinary sentences, no code. The agent should be a patient teacher, not a reference work. One step at a time. Explain what each action does before it happens. Say what counts as success. Stop and diagnose if something breaks, instead of barreling on. Never lay out the whole plan at once. The very first thing the agent is asked to say is a calm helping hand:

The agent's opening (verbatim from AGENTS.md): "You are in the right place. I will walk you through this one step at a time, and you can stop me with questions at any point."

One of the things the agent helps you with is connecting the camera to OpenAI. To transform images, the camera has to send them there, and for that it needs your own API key: a kind of personal access code to OpenAI. You create it on OpenAI's website, and the agent shows you step by step where to find it and where to paste it in. You do it together, just like the rest of the setup.

The rules also protect you. They forbid the dangerous shortcuts: the agent should never ask you to paste the key or a password into the chat. The keys should stay local, on your own machine. It shouldn't run hidden commands you can't see on screen. The risky moments are anticipated and shut down before they can go wrong.

And the rules let the agent read the situation. Did you get a ready-made kit or loose parts? Is the agent running on your Mac or on the camera itself? The file tells it how to tell the difference and pick the right path, without forcing you through a confusing menu of choices.

Notice what all of this has in common. When you wrote your own recipes, you used ordinary words to steer what the image should become. Here OpenAI has done exactly the same with the agent: written a recipe for behavior, in plain text, not in code. That means anyone can open the file, read it and change it. Want an agent that's quicker, funnier or stricter, you swap out the words. The whole project, from the images to the build help, is steered by language you can shape yourself.

It's a new way to share a project: not just the parts and the code, but a teacher that comes in the box. Next, right at the end, it's worth stopping to look at what that really means.

The real project was language

Think about what you've actually done. You turned an image into cheese with a single sentence. You gave the camera new buttons, a new startup screen, a completely new shape, by describing it in words. And you let an agent build the whole camera with you, because someone had written down how it should behave, in plain text. Three different layers, the same move every time: ordinary language, no code.

This is where the genuinely new thing lies. Not in the camera itself, but in the way it's shared. Most projects give you the parts and the code, and leave the rest to you. This one throws in a teacher: a file in plain text that anyone can open, read and change. The threshold moves. You don't have to know the language the machine speaks, just your own.

That means the most important thing you bring into a project like this isn't technical knowledge. It's curiosity, and the courage to ask. The agent meets you where you are, one step at a time, and explains as it goes. If you get stuck, you say so in words, and it helps you on. You don't need to know the answer in advance. You just need to dare to begin.

A camera that turns the world into cheese is a playful start. But the real project was never the camera. It was the discovery that you can build something real by talking to it, in your own language. And that discovery you can carry with you onward, to almost anything.

All images in this post are from OpenAI's ImageGenCam repo on GitHub: github.com/openai/imagegencam. The project is open source under the Apache 2.0 license, which grants the free right to use and build on it, as long as the origin is credited.

Glossary

Term	Definition
ImageGenCam	OpenAI's kit for a self-built camera that transforms photos with AI in real time.
Repo	Short for "repository". A folder of project files that lives online, for example on GitHub.
Clone	To fetch your own copy of a repo that stays connected to the original, so you can get updates.
Codex	OpenAI's AI assistant that can carry out tasks on your machine, not just reply with text.
Raspberry Pi	A tiny, inexpensive computer. Here it's the brain that runs the camera.
PiSugar	Battery and power management in one, letting the camera run without being plugged in.
Ribbon cable	A thin, flat cable that connects the camera module to the Raspberry Pi.
Recipe (prompt)	An instruction written in plain text that tells the AI what the image should become.
Magic button	A "wildcard" button where the camera itself makes up a recipe based on what you point at.
AGENTS.md	A text file with house rules for how the AI agent should behave while it helps you.
API key	A personal access code that lets the camera reach OpenAI's image model. Kept local, never in the chat.
Hotspot	Tethering from your phone, creating a little Wi-Fi network the camera can join while you're out.
Headers	The rows of small metal pins on the Raspberry Pi that the screen presses down onto.
.step file	A digital blueprint of the camera body, for use in a 3D modeling program.

Sources and resources

ImageGenCam on GitHub. The project itself: source code, parts list, build guides and license (Apache 2.0).
Turn the world into cheese (or anything really) with this camera (YouTube). OpenAI's own video showing the camera in use.
OpenAI. The company behind ImageGenCam, Codex and the image model.
OpenAI Codex. The AI assistant that builds and sets up the camera with you.
OpenAI Platform: API keys. Where you create the API key the camera needs.
Raspberry Pi. The tiny computer that is the brain of the camera.
Pimoroni Display HAT Mini. The screen with the four buttons in the kit.