ChatGPT vs Grok: which AI recreates Five Nights at Freddy’s better in 90 minutes?

30 May 2026 14:37 63,065 views

Two leading AI models, ChatGPT 5.5 and Grok 4.3 Pro, are challenged to rebuild a Five Nights at Freddy’s–style game from scratch in just 90 minutes. Here’s how each AI handled coding, game design, 3D assets, and the final scare factor—and which one came out on top.

What happens when you ask two cutting-edge AI models to rebuild a Five Nights at Freddy’s–style game from scratch in just 90 minutes? In this head-to-head challenge, ChatGPT 5.5 and Grok 4.3 Pro are given the same prompt, the same time limit, and the same goal: create a working horror game with cameras, doors, animatronics, and jump scares.

How the Challenge Worked

Both AIs were asked to design and code a FNAF-inspired game loop: a security office, working doors and lights, a power-usage system, security cameras, and roaming animatronic characters that can end the night with a jumpscare. The human in the loop followed each AI’s step-by-step instructions, pasted in the generated code, and wired everything up in a real game engine (Unity) using a pre-made FNAF-style map model.

The rules were simple:

• Same core prompt and reference images for both AIs
• 1 hour 30 minutes total per AI to get from zero to a playable prototype
• The human follows the AI’s plan as closely as possible, only fixing obvious breakages

This kind of time-boxed build is similar to other AI model showdowns, like comparing ChatGPT, Claude, and Gemini on building an NBA 2K-style game in one hour. It’s less about pixel-perfect polish and more about: which AI actually gets you to a fun, working game fastest?

ChatGPT 5.5: Fast, Structured, and Surprisingly Polished

ChatGPT 5.5 was tested in an extended thinking mode and immediately started by generating a large, structured codebase. Its first pass produced around 1,300 lines of code, already enough for a basic playable prototype.

Core Systems and Office Setup

ChatGPT laid out clear steps: import a FNAF-style office and map model, then add scripts for doors, lights, cameras, and power management. The human simply pasted the scripts into Unity and attached them to the right objects (doors, lights, fan, UI elements).

Within the time limit, ChatGPT delivered:

• Functional doors on both sides of the office that can open and close
• Working hallway lights tied to buttons
• A visible power-usage system that increases as you use doors, lights, and cameras
• A fan in the office for extra atmosphere

Camera System and UI

Next, ChatGPT generated camera scripts and UI elements. It even produced images for door and light buttons and the camera interface, which were dropped directly into the game.

The result:

• Pressing a key (like Tab) opened a security camera view
• Clicking on different rooms switched between multiple camera angles
• The UI looked close to the original FNAF layout, with labeled rooms and a clean overlay

Animatronics and AI Behavior

For the animatronics, ChatGPT described classic FNAF-style characters in detail, then those descriptions were fed into an AI 3D model generator. The returned models—Freddy, Bonnie, Chica, Foxy, and more—were imported into Unity and wired into the AI’s movement logic.

ChatGPT provided a large script (over 1,300 lines) to control:

• Animatronic positions and movement between rooms
• Their behavior when near the doors
• Conditions for triggering a jumpscare and ending the night

Even though some characters started as simple placeholders (like cylinders), the logic worked: animatronics moved, appeared on cameras, approached the office, and could be blocked by closing doors in time.

Gameplay and Extra Mode

By the end of the 90 minutes, ChatGPT’s version felt very close to a classic FNAF night:

• You start at 12 a.m. and must survive until 6 a.m.
• Power drains as you use doors, lights, and cameras
• Animatronics move across the map and can suddenly appear at your doors
• Failing to manage power or doors correctly leads to a jumpscare and game over

ChatGPT even added a hidden feature: pressing a key switched to a rough first-person shooter mode where you could run around and try to “defeat” the animatronics. It was buggy and chaotic, but it showed how easily the AI could extend the core game loop into a completely different mode with just more code.

Overall, ChatGPT’s build was stable, feature-complete for a prototype, and felt like a recognizable FNAF clone. It earned a 9/10 from the human tester for how far it got within the time limit.

Grok 4.3 Pro: Big Ideas, More Friction

Grok 4.3 Pro (often branded as “Super Grok”) was given the exact same prompt and images, plus the same 90-minute limit. It took a similar approach—step-by-step instructions and large chunks of code—but ran into more friction along the way.

Initial Prototype and Code Generation Limits

Grok generated the base code in multiple parts, with a limit of about 600 lines at a time. In total, it produced around 2,900 lines of code for the prototype. The intro screen looked good, with a clean “Start Night One” flow and background music.

The early office prototype included:

• Working buttons to toggle lights on both sides
• A power system that drained quickly when lights and doors were used
• Basic camera switching and animatronics visible on certain feeds

However, the camera views were more confusing, with some areas looking abstract or incomplete—at one point, a scene even resembled a random Lego piece.

Unity Integration Challenges

When moving to the full Unity build with a detailed office model, Grok provided step-by-step instructions similar to ChatGPT: import the model, set up the office, and then add scripts for doors, lights, and the fan.

But the implementation was bumpier:

• At first, the fan blades moved with the mouse cursor instead of spinning in place
• Several camera scripts didn’t work correctly and required repeated fixes
• A large portion of the 90 minutes was spent debugging and asking Grok to “think harder” about what was going wrong

Eventually, the major bugs were fixed and the office looked and felt good, with smooth door animations and custom button textures generated by Grok. The camera system finally worked, letting the player switch between different areas on the map.

3D Models, Animations, and Visual Flair

Like ChatGPT, Grok was asked to describe Freddy Fazbear and other animatronics in detail for an AI 3D model generator. The resulting models were surprisingly solid and visually appealing, and Grok was also used to help generate the rest of the cast.

The final build included:

• Multiple animatronics with distinct looks
• Fun, exaggerated animations like a breakdancing chicken
• A custom in-game poster created from a humorous self-description, replacing the original FNAF poster

These touches gave Grok’s version a more chaotic, meme-like energy. The animatronics were scary in concept but hard to take fully seriously while they danced outside the office.

Gameplay and “Dance Party” Chaos

In the final playthrough, Grok’s game started at an odd “0 a.m.” instead of 12 a.m., but the core loop was there: check cameras, monitor doors, and try to survive.

However, the gameplay felt more punishing and less predictable:

• Animatronics frequently stacked at the doors, forcing both doors closed and rapidly draining power
• There was no clear audio feedback, so constant door-checking was required
• It felt almost impossible to survive until 6 a.m. without running out of power

On top of that, Grok added a wild “dance party” mode where the scene exploded into flashing lights and erratic animations—visually intense and more comedic than scary. It showcased Grok’s creativity but also how quickly things could spiral into unplayable chaos.

ChatGPT vs Grok: Which AI Won?

Both AIs proved they can build a recognizable FNAF-style game loop from scratch in under two hours, complete with cameras, doors, power management, and animatronics. But they differed in reliability, usability, and polish.

Where ChatGPT 5.5 Stood Out

ChatGPT’s strengths in this challenge were:

• Stability and structure: It produced large, coherent scripts that mostly worked on the first try.
• Clear step-by-step guidance: The instructions for what to do next in Unity were easy to follow.
• Functional completeness: It delivered a full FNAF-style loop—cameras, doors, power, jumpscares—within the time limit.
• Extra features: The bonus first-person mode showed how easily it could extend the game beyond the original spec.

For someone who wants to use an AI as a practical game-building assistant, ChatGPT felt more like a dependable coding partner. That lines up with broader trends where ChatGPT often excels as a general-purpose coding and prototyping tool, as seen in comparisons like ChatGPT vs Claude vs Gemini.

Where Grok 4.3 Pro Shined (and Struggled)

Grok showed flashes of strength:

• Visual flair: Its UI, intro screen, and generated textures looked good, and the 3D models were strong.
• Creative twists: The dance party mode and animated characters gave the game personality.
• Detailed asset descriptions: It did a solid job describing characters for 3D generation.

But it struggled with:

• Debugging complexity: Many scripts needed multiple rounds of fixes before working.
• Time efficiency: A lot of the 90 minutes was burned on troubleshooting cameras and object behavior.
• Game balance: The final game felt unfair and overly chaotic, with constant pressure and little chance to survive a full night.

Takeaways for AI-Powered Game Development

This FNAF-style challenge highlights a few useful lessons for anyone using AI to build games:

• Use AI for structure first: Let the model design your systems (states, events, game loop) before you worry about polish.
• Expect to debug: Even strong models will produce bugs, especially in complex engines like Unity.
• Leverage AI for assets: Detailed text descriptions can feed into AI 3D model tools to quickly populate your world.
• Time-box your builds: Short sprints (like 90 minutes) are a great way to see how far an AI can take you before you need manual refinement.

In this specific showdown, ChatGPT 5.5 was the clear winner. It was easier to use, more reliable under time pressure, and delivered a more coherent, playable horror experience. Grok 4.3 Pro brought creativity and style, but required more hand-holding and debugging to reach the same level of polish.