Turns out, AI can actually build competent Minesweeper clones — Four AI coding agents put to the test reveal OpenAI’s Codex as the best, while Google’s Gemini C

Turns out, AI can actually build competent Minesweeper clones — Four AI coding agents put to the test reveal OpenAI's Codex as the best, while Google's Gemini C

Though, the experience fell apart when there was no chording support — "unacceptable," according to the OP. There was a "Power Mode" which acted as the gameplay twist, lending you simple power-ups that would require genuine creativity on the agent's end. On mobile, there's also a "Flag Mode" button that's a decent alternative to long-pressing to mark the tiles.

In our opinion, this was the best-feeling clone, too, when we tried it. Claude Code's Opus 4.5 model built the Minesweeper clone in less than 5 minutes and featured the cleanest coding interface. Overall, the presentation is very solid, leading to 7/10 score that would be higher had the chording feature been there.

(Image credit: Future) (Image credit: Future) In third place, we have Mistral's Vibe that produced a namesake product, which is to say, the results were synonymous with something that'd be vibe-coded. The game worked and looked fine, but it lacked the ever-important chording feature and didn't have sound effects. There was also a "Custom" button at the bottom that did nothing. Vibe didn't add any fun gameplay twists either, so all that knocks off a few points.

Musk challenges legendary AI researcher Karpathy to an AI coding showdown against Grok 5 — gets a polite 'no' to an IBM Deep Blue-like showdown

MSI's new AI-powered PC building assistant recommends the 9800X3D as a budget CPU

Epic Games' Tim Sweeney thinks game stores shouldn't bother with 'made with AI' labels

The coding interface was solid and easy to use, but not exactly the fastest — though, the last place is so far off that the bar isn't very high. Ars Technica's editors were impressed by how well it performed, despite lacking the large-scale resources of the big names. At the end, Mistral Vibe got a 4/10, which seems lower than it deserved based on their description.

(Image credit: Future) (Image credit: Future) Dead last was Google's Gemini CLI, which might be surprising to some considering how often Google tops benchmarks these days, and the general comeback story associated with the return of co-founder Sergey Brin to helm frontier AI at the Cali giant. Gemini's Minesweeper clone simply didn't work. It had buttons, but no tiles to speak of, so there was no game to play or even score.

In terms of visuals, it looks eerily similar to Claude Code's final result; like if someone had stopped the agent mid-coding. Gemini also took the longest, with each code run taking an hour and the agent constantly asking for external dependencies. Even after slightly altering the rules to give it a second chance with a hard-and-fast instruction to use HTML5, it couldn't produce a usable result.

Ars Technica does note that Gemini CLI didn't have access to the latest Gemini 3 coding models and relied on a cluster of Gemini 2.5 systems instead. Perhaps, paying for the higher tier of Google AI would've ended more favorably, rendering this test as "incomplete," but it's still pretty disappointing, nonetheless.

So, there you have it — this is what we allowed to quadruple our memory prices and ruin computers for the time being. Codex won with Mistral Vibe and Claude Code following closely, and Google not even trying, but at what cost. If you weren't already all-in on AI, it's safe to say that this experiment won't convince you of anything.

Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.

Hassam Nasir Social Links Navigation Contributing Writer Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.

salgado18 So, there you have it — this is what we allowed to quadruple our memory prices and ruin computers for the time being. Codex won with Mistral Vibe and Claude Code following closely, and Google not even trying, but at what cost. If you weren't already all-in on AI, it's safe to say that this experiment won't convince you of anything. I hate the memory prices like everyone else, but these results did convice me that, given some time and other prompts, AI can create fully featured games. This is a one shot test. If you could tell Claude "you forgot this feature" it would add it and be better than Codex. With the same interface style, AI could recreate Space Empires 3, and that's not a small feat. Reply

Captain Awesome Mine works perfectly in Claude in the first prompt, making it double check it's results. 😁 The nice thing about Claude is you can also run whatever you created inside of it. Hello my friend. Can you recreate Minesweeper as a React single page application. Double check your results, be sure it has all of the features from the original. https://claude.ai/public/artifacts/11c24699-5cf1-4077-b660-f5fba8b86c52 Reply

Turns out, AI can actually build competent Minesweeper clones — Four AI coding agents put to the test reveal OpenAI’s Codex as the best, while Google’s Gemini C

Key considerations

Reference reading

More on this site

Leave a Comment Cancel reply

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply