Case Study

Two Years Later: Rebuilding Sudoku with Codex 5.3

By Jerry Shi · A follow-up experiment on AI-assisted iOS development in 2026

Two years ago...

It’s funny how quickly life can change when you’re not paying attention. Over the last two years, a lot of exciting things have happened. Some of them are career and technology related. Some of them are just life. And yes, one of the biggest life updates is that I have a cat now.

Her name is Tangerine. She’s a British Shorthair with a very serious face and absolutely zero respect for my keyboard time. I’ll add her picture in this blog because she has earned it.

On the tech side, the last two years have been a nonstop parade of “wait… it can do that now?” moments. AI companies have been moving at an incredible pace. OpenAI, Google, and Anthropic, in particular, have been pushing each other hard, and that competition is a big part of why the tools we have today feel so different from just a couple of years ago.

Today I took a day off. No meetings. No tickets. Just one of those rare days where you wake up and realize you’re bored, in a good way. The kind of bored that makes you want to build something for no particular reason. That’s when I remembered a small experiment I did two years ago.

Back then, I had just started using ChatGPT. I had no iOS development experience. I didn’t know Swift. I didn’t know SwiftUI. I didn’t know what I was doing. But with ChatGPT helping me along the way, I managed to build and ship a Sudoku iOS app in about twelve hours. It became one of my favorite personal case studies because it was both ridiculous and real.

So naturally, a question came back to me. If I could do that two years ago with early tools and a lot of back and forth guidance, what happens if I try the same thing today with Codex 5.3?

I’m not trying to prove anything big here. I’m just curious. How long would it take, in 2026, for a “developer” to create an iOS app if the AI is truly acting like an independent agent?

There is an obvious problem though. I do have two years of iOS experience now. If I start debugging, refactoring, and helping too much, the experiment stops being meaningful. So I decided to blindfold myself, not literally, but in spirit.

I decided to vibe code. I set a rule for myself: I would create a clean folder, give the AI a spec, run whatever it produced, and only interact by describing outcomes like something crashing or a layout breaking. I would not look into the code or fix anything myself. The goal was to simulate the experience of someone with no experience and see how far today’s AI could carry them.

Because I was pretending I had no iOS experience or prompt experience, I started by asking ChatGPT 5 to help me write the prompt. First, I needed to explain the app idea. In my case, I already had an app, so I didn’t need to invent anything new. I simply described what features my Sudoku app had, how the game modes worked, and how difficulty was defined.

By this point, I was already treating ChatGPT like a colleague. I explained what I wanted to do, described the constraints of the experiment, and answered a few follow up questions. About five minutes later, I had a prompt I could actually use.

I put the exact prompt I used on a separate page so it can be shared and referenced directly.

View the Prompt

Once I had the prompt, I copied it into Codex, set the model to GPT 5.3 Codex, and hit run. I did not choose the code max option because this should be a small app, but I did turn the reasoning level up to extra high so Codex could examine its work more carefully. After I clicked enter, Codex started working.

For a brief moment, I wondered if I should go grocery shopping and come back later to check the progress, or maybe do laundry in case it finished quickly. Tangerine solved that problem for me. She jumped onto my lap, made it very clear that I was not going anywhere, and I decided to stay put, watch the progress, and play with the cat.

Surprisingly, before Tangerine even finished her snack, Codex appeared to be writing tests. That caught me off guard, and my first thought was that it might be following a test driven approach and writing tests before the actual code.

A moment later, Codex stopped and explained what it had done and why. That was the moment when this stopped feeling like a fancy code generator and started feeling more like watching a developer think out loud.

Then came the moment of truth. I opened Xcode. I still don’t know why these companies love the letter X so much. Xcode. Codex.

I ran the build on the simulator and, to my surprise, the app launched immediately. I played a game and solved a puzzle. Everything worked exactly as I had described. Other than a few minor style details I would personally tweak, there were no bugs and no missing features.

My original plan was to fix bugs and slowly make the app playable through multiple iterations. That never happened. Codex finished most of the work in about fourteen minutes. At that point, I decided not to intervene at all and simply stop the experiment there.

I am honestly amazed by the result. I don’t know exactly what the software development industry will look like in the future, but I do know this. People are going to enjoy these tools, and they are going to build things that would have felt impossible not very long ago.

If you want to see exactly what Codex produced, you can find the source code here.

Source Code