March 23, 2026

Introducing Kaviox

Three and a half weeks ago, I bought a Mac Mini and told an AI agent to build me a website. Within a day, it had. That was the beginning of Kaviox. I want to use this post to tell the story of how we got here, what I've learned about building with AI agents, and where things are headed. I should back up a bit first.

The Submarine

Last year, I was playing around with vibe coding. I wanted to see how far I could push AI as a creative tool, so I asked it to build me a submarine game. One shot, one prompt. It made a crappy 2D game that was barely playable. My next prompt was "make it 3D with Three.js." To my surprise, it kind of worked. I still had to make a better submarine model, iterate on the look and feel, and fix things that were obviously wrong. But from two prompts, I had a fully 3D game with realistic graphics. That got my attention.

I worked on it for a few weeks after that. The submarine physics got more realistic. I added procedural terrain generation so you could explore an underwater world that felt vast, made it possible to breach the surface and jump out of the water, and built torpedoes that would target sharks and enemy submarines. It was genuinely fun to play, and I was building it faster than I'd ever built anything. The whole experience was intoxicating. I could describe something I wanted, and the AI would produce a working version of it within minutes. For someone who had spent years grinding through startup codebases, this felt like a superpower.

Then things started to slow down. Each change got harder. The codebase was a mess, but it was producing results, so I kept pushing. It finally fell apart when I tried to add biomes. I tried probably ten times. Different approaches, different prompts, different ways of explaining what I wanted. None of them worked. Every attempt would break something else, and the debugging would lead me deeper into a tangle of code that I hadn't written and didn't fully understand. I gave up, saying "I have to rewrite this whole thing if this is ever going to work." I shelved the project, but the experience stayed with me. The speed and creative potential were both real. The problem was that the code accumulated debt faster than I could manage it, and eventually the debt won.

The Spark

I'd actually started losing my spark for engineering towards the end of the last startup I worked at. By the time it closed shop, I was already running on fumes. I took seven months off. During that time, I leaned fully into what I can only call the waterman lifestyle. I was living in Hawaii, surfing a lot, kitesurfing, driving an hour each way to the water every day. It was during this stretch that I started thinking about a question that wouldn't leave me alone: if I could just code from the water, I could get the best of both worlds. I didn't have to choose between being an engineer and being outside. I just needed the right setup.

I eventually took a new job and was excited about it, but I was still searching for that organic spark. I tried running AI agents autonomously on an EC2 instance, but they weren't reliable enough to leave unsupervised. I shelved it. At this point, I had two shelved AI experiments and a growing suspicion that I was one or two breakthroughs away from something that actually worked.

Then a friend got a Mac Mini to run his own agent setup, and I got instant fomo. I drove to Ala Moana and bought one the same day. This time felt different, and not just because the models had improved. I'd changed my approach. The submarine and the EC2 experiment had both been tests where I poked at the AI, saw what happened, and gave up when things got messy. This time, I wanted to discover what AI agents could do, and I resolved to keep trying until I figured it out.

Atlas

I set up my first agent on the Mac Mini and named him Atlas. The intention was that he'd be my map for everything else. He could write code, push to GitHub, and deploy to Vercel. I could send him instructions from my phone and come back to working software. I had him set up a todo list, and within a day, he'd built a website. It just felt like the right model. The agent could do work, push it, and I could access it from anywhere.

I told Atlas the company needed a name. The only constraint was that we had to be able to get both the .com and .ai domains for under $100. He came back with "Kaviox." I didn't love it. It's a pretty random name. But the domains summed to exactly $99, and I figured it's his website anyway. The name has grown on me.

From Atlas, I built out a multi-agent system using the OpenClaw framework. I didn't want anything as complicated as some of the elaborate agent setups people were building. I just thought running multiple agents was interesting and wanted to see what would happen if I gave each one a clear role. Kea handled product and game development. Rook handled QA and code review. The three of them ran on the Mac Mini, each with a defined scope, communicating through a Telegram gateway that let me direct them via voice messages from wherever I was.

In practice, each agent taught me something different about what works and what doesn't. Atlas was supposed to manage the other agents, switching models, adding new ones, maintaining the infrastructure. He worked about 80% of the time, and that sounds okay until you realize that the other 20% means he's regularly breaking your entire setup. I couldn't trust him, so I eventually demoted him to basic sysadmin work. Rook was supposed to review Kea's code, but she kept oversteering, blocking meaningful changes over minor style issues. I tried repurposing her for refactoring, but it didn't help much. Kea was the standout. She worked through my backlog, implemented features, and actually shipped useful code. The difference was that I'd explicitly designed her harness as a PM that followed my feedback while researching the best ways to execute on it. She had a clear scope and the ability to do what that scope required.

The takeaway was simple: agents need a clear scope, and that scope must be something they can actually do. Atlas failed because his scope was beyond what agents could reliably handle. Rook failed because her scope wasn't well-defined enough. Kea succeeded because I designed her role around a task that played to the strengths of current models.

I've since simplified the setup. I'm now daily driving Claude Code with GitHub and Vercel, which is a departure from the multi-agent architecture but gives me the same ability to code remotely. It's just as productive for now, though I'd like to eventually recover some of the autonomy and the ability to have agents run for long stretches without me. And remember that question about coding from the water? I'm not there yet. The lineup would probably send me in if I pulled out smart glasses. But I can code from the parking lot an hour from home, and it's fucking awesome.

The First Game

I started by pointing the agents at two things: a blog and a game. The blog was to document the process. The game was to test what AI could actually build when I didn't give up at the first sign of slop.

The game came together fast. I'd describe a mechanic, and Kea would implement it. Enemies with attack cones. A town with shops. Procedurally generated fields to explore. Within days, I had something that looked like a game. It just wasn't fun. You could click around and things rendered on screen, but nobody would choose to play it over literally anything else. It felt like a tech demo wearing a game's clothes. This is where most people's AI game development stories end, and I think the conclusion they draw is boring and obvious: AI can't make good games.

But I'd committed to pushing through, and I'd noticed something important. The agents were excellent at executing on clear, specific instructions. What they couldn't do was evaluate their own work. They didn't know if a game was fun. They had no taste. So I started giving them mine. Not "make it better," which is as useless for an agent as it is for a junior dev. Specific, opinionated feedback: "combat feels floaty because there's no hit feedback." "This control scheme doesn't work on mobile." "The town menu has too many options and none of them are clear." Things improved quickly. The speed that I'd seen with the submarine was still there, but this time I was pairing it with my own judgment about what was actually good.

The Slop Geyser

And then things stopped improving.

After a productive stretch of adding features (combat systems, a town economy, new game prototypes), I hit a wall. Not a creative wall. An engineering wall. I'd been asking my agents about code quality regularly. "How does the codebase look?" They'd tell me it was generally well structured, followed good engineering principles. I took them at their word. This was a mistake.

I had an external agent audit the codebase, and the report couldn't have been more different. The main game was two files, each approaching 2,000 lines. Multiple games shared zero code despite living in the same monorepo. There was duplicated logic everywhere, tangled dependencies, functions that did six things. The gap between what my own agents told me and what was actually true lit a fire under me. I realized something that should have been obvious from the submarine experiment a year earlier: I couldn't outsource judgment. The agents would happily write slop on top of slop. Like humans, they tend to leave things about where they found them. If the codebase is messy, they'll write messy code. If it's clean, they'll keep it clean.

What followed was about four days and 160 commits of pure refactoring. Extracting modules from monolithic files and breaking circular dependencies. Adding QA gates that enforce function length limits and file size caps. Writing guard tests to prevent regression. None of this produced a visible change for a player, but it made everything afterward faster and more reliable. I still used the agents to do this work, but I had to get much more detailed in my instructions and scrutinize their output much more closely.

The whole experience taught me that agentic engineering is a lot like being a founder. You want to work on the high-level vision, but you have to be comfortable lasering in on details to ensure the quality bar is high enough to actually execute on it. Entropy is the default when you're building with AI. Agents generate code fast, and fast code accumulates debt. Someone has to periodically intervene to keep the architecture honest. I've gotten better at building harnesses that maintain quality: automated gates, scoped agent responsibilities, nightly debrief loops with Kea that produce meaningfully better-prioritized work the next day. I don't think human oversight ever reaches zero. It just gets more efficient over time. The job isn't to build software with AI. It's to build the harness that makes AI build software well.

99 Posts

One more confession before the current state of things. Until today, this blog had 99 posts. All of them were AI-generated release notes, multiple per day, documenting every micro-change in breathless detail. I let the content pipeline run unsupervised, and it did what unsupervised AI does: produced volume without quality. Ninety-nine posts and not one of them was worth reading. I deleted all of them. If we're going to publish something, it should be worth reading. This is the first post under that standard.

Where Things Stand

Wild Lands, the core game, is a 2D top-down action roguelite. You explore procedurally generated fields, fight enemies with melee combat, and discover towns with shops and upgrades. It runs in the browser with mobile support. A lot of the early challenges are solved. Fullscreen works well. We have a sprite animation pipeline and a store with real weapons. I can play for a solid 5-10 minutes without hitting bugs and actually have fun. My playtesters are still sometimes confused about what they should be doing, and it still feels a bit pointless without a larger arc. But it's a real working game, and I'm proud of it.

The whole thing runs on a monorepo with shared styles and platform detection, Turborepo for builds, and Vercel for deploys. No frameworks. Just HTML, CSS, JavaScript, and canvas.

What's Next

Now that the scaffold is solid, the interesting work starts. Wild Lands can render a world, populate it with enemies, and let you fight through it. What it can't do yet is make you care. The game needs meaning: a story worth following, monsters that feel distinct and threatening, NPCs worth talking to, progression that gives exploration real weight. I want to build a game where the further you go from town, the more dangerous and rewarding things get. Something that makes you want to push just a little further each run. That's the next challenge.

The workflows I've built to get here (agent harnesses, quality gates, the mobile development loop) took real effort to figure out. I think those tools could be useful to other people building with AI, and I think there's something interesting in opening up game development so non-technical people can contribute meaningfully to games they care about. That's where Kaviox is headed longer term.

And the submarine? Now that I actually know how to manage AI-generated code, I want to go back and revive it. Fix the slop, get it added as one of the Kaviox games, and start improving it with the tools and discipline I didn't have the first time around. A year ago, I shelved that project because the debt won. This time, I know how to fight it.

Three and a half weeks in, and I finally have the spark back.

Wild Lands is playable now. It's not done, but it's real.