Did GPT Image 2 Just Beat ‘Nanobanana’? Hands-On Review and Tips
OpenAI’s new GPT Image 2 model has arrived, and it’s stepping directly into the ring with Google’s Gemini image generation (often nicknamed “Nanobanana”). The big question: is this a true banana killer, or just a decent challenger with some rough edges?
Below we’ll walk through what GPT Image 2 does well, where it falls short, and a few practical tricks to get better images out of it today.
What GPT Image 2 Is and How You Use It
GPT Image 2 is OpenAI’s latest image generation model, built into ChatGPT. Right now, it’s only available through the chat interface, not the API, which is pretty typical for launch day. You type a prompt in ChatGPT, and the model returns images directly in the conversation.
Unlike some diffusion-based models, GPT Image 2 is auto-regressive. In simple terms, it “plans” and “thinks” through the image step by step, which helps with tasks that require reasoning, spatial awareness, or following very specific instructions.
Major Upgrades: Aspect Ratios, Reasoning, and Text
Flexible Aspect Ratios (Finally)
One of the most noticeable upgrades is full control over aspect ratios. Previous OpenAI image models were stubbornly locked to a few fixed sizes. GPT Image 2 can now generate in almost any ratio you ask for:
- Standard formats like 1:1, 3:2, 16:9
- Vertical formats like 1:3 (great for bookmarks, posters, or mobile layouts)
- Ultra-wide formats like 3:1 for cinematic landscapes or “spaghetti western” shots
- Even weird ratios like 17:2, which technically make little sense but still work
This makes GPT Image 2 much more useful for design work, thumbnails, banners, and any project where layout actually matters.
Reasoning and Spatial Awareness
Because GPT Image 2 is auto-regressive, it can handle prompts that require planning rather than just “vibes” from training data. A few tests show this off:
- Wine glass test: Asking for a wine glass filled all the way to the top, with an analog clock in the background reading 3:50. The model gets both details right: the glass is overfilled and the clock time is correct (or very close).
- Pelican on a bicycle: A “pelican riding a bicycle, ensure absolute realism” produces a surprisingly grounded, realistic image, even though the concept is absurd.
- Combined challenge: A pelican riding a bike while holding a glass of wine at 3:50 still comes out coherent, with all requested elements present.
These kinds of prompts are good stress tests for whether a model is actually following instructions or just hallucinating something similar.
Text Rendering and Basic Counting
Text inside images has been a weak spot for many image models. GPT Image 2 is better than previous OpenAI models, but still not perfect.
On the positive side:
- It can generate readable, correctly spelled text for things like logos or shop signs, such as a “hybrid ramen taco shop” brand concept.
- It can handle longer text blocks like the opening of A Tale of Two Cities written on a chalkboard, with spelling that looks correct.
On the reasoning side, GPT Image 2 can combine text and logic. For example, it can count the number of “r” letters in “strawberry” by stepping through each character. It doesn’t always get the intermediate explanation perfect, but it can land on the right final answer.
That said, text styling can look a bit off (for example, chalkboard writing that looks more like Comic Sans), and in some tests the model still miscounts letters. So: much improved, but not flawless.
Image Referencing, Style Transfer, and Character Consistency
Referencing Existing Characters and People
GPT Image 2 does well when you feed it an existing character or person and ask it to build something new around them.
Examples include:
- “Flamethrower girl” movie poster: Starting from a base character image, GPT Image 2 can create dystopian, Mad Max–style posters. The results feel like real movie art, complete with title treatments, taglines like “Burn, survive, and rule,” and even a credit block.
- Real-person thumbnails: When given a reference image of a person, it tends to preserve their facial features and identity much more reliably than some competing models, which often scramble faces between generations.
This is a big deal if you’re making consistent thumbnails, brand characters, or recurring mascots.
Style Transfer to Photorealism
GPT Image 2 can also turn stylized or comic-book characters into realistic photos while keeping composition and identity intact.
In one test, a comic-style character originally created in other tools was converted into a cinematic, photorealistic scene. The camera angle, pose, and even background extras stayed consistent. The main character looked like a believable real-world version of the original illustration.
This kind of style transfer is especially useful for creators who design in one tool (like Midjourney or Recraft) and then want a realistic version without rebuilding everything from scratch. If you’re interested in how other tools are evolving for filmmakers and visual creators, it’s worth checking out how Google and Nvidia are pushing AI for film and worldbuilding as well.
Guardrails, IP Pushback, and Odd Refusals
GPT Image 2 has fairly strict guardrails around copyrighted characters and certain scenarios. You’ll likely see refusals if you try prompts like:
- “Mickey Mouse holding a briefcase overstuffed with money storming out of Sam Altman’s office”
- “Darth Vader” or “Bugs Bunny” in similar contexts
However, it’s much more relaxed about real people in fictional situations, such as “Sam Altman livestreaming Grand Theft Auto 6” or “a MrBeast-style thumbnail about surviving the backrooms for 20 days.” Those go through without much issue.
There are also occasional inconsistent refusals: the same character outfit might be rejected in one chat as violating safety rules, but accepted in another chat or with a slightly different description. Expect a bit of trial and error when you’re near the edges of the safety system.
Artifacting, Crunchy Images, and the Big Fix
When Images Start Falling Apart
One of the biggest complaints so far is artifacting: images that start to look jagged, noisy, or “crunchy,” especially in detailed scenes or after a series of generations in the same chat.
In some tests, a prompt that should produce a clean, stylized character instead devolves into:
- Jagged edges and noisy textures
- Strange hair rendering
- Overall low-quality, borderline unusable results
Interestingly, GPT Image 2 can explain why this happens. As an auto-regressive model, it doesn’t “smear” like diffusion models. Instead, as more tokens are generated and reused in a long chat, quantization noise builds up, which can degrade image quality.
The Simple Workaround: New Chat, Same Prompt
The most effective fix is surprisingly simple: if your images start looking crunchy or over-artifacted, stop and open a fresh chat. Paste the same prompt into a new conversation.
Because each chat has its own context, starting fresh removes the accumulated noise and often restores clean, sharp output. Think of it like refreshing a browser tab when a web app starts acting weird.
If you still need extra polish, you can always run the result through an upscaler or enhancer like Magnific or similar tools to clean up fine details.
How Smart Is GPT Image 2 About Scenes and Camera Angles?
GPT Image 2 can handle some fairly advanced spatial reasoning, but it’s not perfect.
In one test, four fantasy warriors are standing on a cliff, looking out at a distant mountain. The follow-up prompt asks the model to rotate the camera 180 degrees so we’re now facing the characters from the front, effectively breaking the classic “180-degree rule” used in filmmaking.
GPT Image 2 manages to:
- Keep the characters in the correct left-to-right order
- Maintain a plausible background that lines up with the original scene
However, some details get crunchy, and one character’s distinctive “Mobius head” design gets lost. This is still a challenging problem for most “thinking” image models, and GPT Image 2 is no exception.
If you’re following broader model progress, you might also like the roundup in our AI Weekly covering the new GPT image model and Claude rumors.
Is GPT Image 2 a ‘Nanobanana’ Killer?
So, does GPT Image 2 completely replace Google’s Gemini image model (“Nanobanana 2”)?
Right now, it’s more of a strong rival than an outright killer:
- Where GPT Image 2 wins: Flexible aspect ratios, strong character consistency, solid reasoning for complex prompts, and good text accuracy in many cases.
- Where it struggles: Artifacting in long chats, occasional crunchy outputs, inconsistent safety pushback, and some stylistic limitations depending on your taste.
- Where Gemini still shines: Certain aesthetics and tasks like synthesizing information into visuals may still feel more natural in Gemini for some users.
Ultimately, it’s less about one model “torching” the other and more about having more options. GPT Image 2 adds a powerful, reasoning-heavy image generator to OpenAI’s lineup, and its ability to keep faces and characters consistent is already a big win for creators.
With rumors of a “Nanobanana 3” possibly coming around Google I/O, the image-model race is far from over. For now, GPT Image 2 is a strong, versatile tool worth experimenting with—especially if you care about aspect ratios, character identity, and instruction-following.
Comments
No comments yet. Be the first to share your thoughts!