LOGO of Chaosland

2026.06.07

Learning to Direct: A Viewfinder Full of Voices

A game about unspoken inner thoughts made me want to build a camera that murmurs at whatever you point it at. Building it taught me something I didn't expect. I didn't need to become an expert; I needed to learn who to summon.

A voice that was never spoken aloud

I fell in love with this idea while playing Life is Strange. It wasn’t the plot that got me, it was a small thing. When Max looks at a person, or picks up some random object in a room, a little inner comment surfaces: a quiet aside about the dent in a trophy, the smell of a classroom, the way someone is holding their coffee. She never says any of it out loud. It just drifts through her head, and you, the player, get to overhear it.

I found that unbearably charming. It made me want to touch things in the world, to walk up to objects, because I had started to expect a response. I would point her at something and quietly wonder what she was going to say about this one. That feeling of being gently answered by the world is the whole reason this project exists.

Borrowing a committee

If Life is Strange gave me the longing, Disco Elysium gave me the shape. In Disco Elysium you do not have one inner voice, you have a whole committee of them. Logic, Inland Empire, Electrochemistry, Empathy, and they take turns. You walk up to a corpse and your Logic chimes in with a deduction while your Inland Empire whispers something none of the others can hear. There is no single narrator in your skull, there is a cast.

So I wanted to take that committee out of a fantasy detective’s head and point it at the real world, at whatever you happen to be looking at right now. I built View-Wander, a camera viewfinder where you press the shutter and a few of those inner voices look at your photo and murmur about it, one after another. I gave it one rule I refused to break: the model always sees the full 3:4 frame, but you only ever see the centered square. The meaning outside what you framed, the hand creeping in from the edge you cropped out, is the soul of the thing. That was the dream, and then I tried to build it.

The handoff I couldn’t make

I hit two walls immediately, and they were both walls of not knowing. I did not know how to write the prompts, and I did not know how to test a model in any systematic way, how to tell whether a change made the voices better or just different.

So I did what I always do and asked Claude Code. Here is the part that actually stuck with me. It would often just start changing things on its own, rewriting a prompt, restructuring a call, off to the races, and I would sit there not really knowing what it had changed, or why, or whether it was better. I had handed the work off, but I had also handed off the steering, and I could not read it or take the wheel back. It took me a while to name what was wrong. It was not that the help was bad, it was that I had made a handoff with no way to stay the director of my own project.

Learning to summon

The fix was not to go learn prompt engineering and psychology from scratch. The fix was to learn who to call. When I did not know how to write or evaluate prompts, I wrote an ai-engineer skill, and the thing it gave me was not a better prompt, it was a method and some visibility. It refused to let me tweak by vibes. It made me pin down what “good” even meant first, freeze that into an eval set, and only then iterate. Suddenly I could see whether a change helped, and I was back in the director’s chair.

Then, when I needed to understand the 24 skills of Disco Elysium well enough to turn them into personas, I summoned a psychologist skill. I am not a psychologist. I would never, on my own, have known things like this:

Do not let the Empathy persona explain mirror neurons. It is a badly overhyped neuro-myth: the mechanism is real, but “we empathize because of mirror neurons” is a pop misreading. If the voice lectures about it, it gives itself away.

Or this, about the Electrochemistry voice:

It is built on “dopamine equals pleasure,” but Berridge’s work separates wanting from liking, and dopamine is mostly about wanting. The good news is that the game defines it as “the whisper of desire the instant it happens,” which accidentally lands right on wanting. Lean into that and the voice gets sharper.

That is not information you can google your way to, it is judgment, a pair of eyes I do not have. And the most beautiful part is that the two skills handed off to each other. The psychologist produced a set of “core cards,” one per persona, and wrote right at the top of the document that the downstream reader was the AI engineer. The psychologist diagnoses, the engineer writes the voice, and I was the person standing in the middle connecting them. I never became an expert in either thing. I learned to be the director.

Everywhere, the same problem of drawing lines

Here is what surprised me most. When I actually read that psychologist’s document, half of it was not describing personas at all, it was warning about collisions. Inland Empire and Logic kept hogging every photo. The four social voices, Empathy, Authority, Composure, and Drama, all wanted to read the same faces and kept collapsing into the same flavor of “reads emotion.” So the document spends its energy sharpening borders. Empathy asks how they feel, Authority asks who is dominating whom, Composure asks who is keeping up a front and who is cracking, Drama asks whether any of this is even real. Same photo, four different questions.

That is a casting problem, and it is exactly the same problem I had everywhere else. In the app, it was making 17 voices coexist without one drowning out the others. In the model calls, it was deciding which voice speaks, in what order, and on what context, so the relay feels like a conversation instead of one model rambling. A director call picks who, a fixed order decides when, and the frontend stitches the prior lines into what each one sees. In my own workflow, it was knowing when to summon the psychologist versus the engineer, and not letting one do the other’s job.

Directing game personas, orchestrating model calls, orchestrating Claude skills, the hard part is never the talent, it is always drawing the lines. Who speaks, when, and about what. Keeping each of those decisions somewhere I can read, test, and reason about, instead of somewhere a black box runs off with them.

The viewfinder, again

I set out to build a viewfinder, and what I did not notice until the end is that building it was itself an act of framing. You face a single frame and you decide who to call into it. I never became a model expert and I never became a psychologist. What I actually learned is the smaller, stranger skill underneath both, knowing when to summon whom, whether it is the 17 voices inside a photo or the two skills I called into my own head to make them.

Which, funnily enough, lands me right back where I started. The thing I loved in Life is Strange was the anticipation, pointing Max at something and wondering what she would say next. Somewhere along the way I stopped being the one who waits for the voices. I became the one who presses the shutter and lets them in.