First impressions of DeepSeek V4: how good is this open-source release really?

24 May 2026 04:37 26,459 views

DeepSeek V4 is one of the most anticipated open-source AI model releases of the year. Here’s a practical, side‑by‑side look at how it stacks up against GPT, Gemini, GLM, Opus, and others across image generation, SVGs, and UI design prompts.

DeepSeek V4 has finally arrived, and it’s one of the most important open-source AI releases of the year. After a long wait since DeepSeek V3 and 3.2, this new model promises a big leap in capability—especially in multimodal generation. But how well does it actually perform compared to today’s top models?

Below is a practical, side‑by‑side look at DeepSeek V4 Pro (Thinking) against models like GPT 5.4, Gemini 3.1, GLM 5.1, Opus 4.7, and others—focusing on image quality, SVGs, UI layouts, and overall stability.

Where DeepSeek V4 Comes From

DeepSeek has been building momentum for a while. DeepSeek V3 in late 2024 was a breakthrough for open models, and later 3.x updates helped push Chinese and open-source labs into the global conversation. But since then, the landscape has changed dramatically.

By early 2025, we’d already seen strong competition from GLM, Kimi, and others, plus rapid progress from closed models like GPT, Gemini, and Opus. That’s the context DeepSeek V4 is launching into: it’s no longer enough to be “good for open source”—it has to stand up against the very best.

The model tested here is DeepSeek V4 Pro (Thinking), the strongest variant, to give it the fairest comparison against top-tier systems.

Image Generation: Big Leap, But Not Yet Best-in-Class

One of the clearest ways to see a model’s strengths and weaknesses is through image prompts. The tests here compare DeepSeek V4 against GPT 5.4, Gemini 3.1, GLM 5.1, Opus 4.7, Muse Spark, and others across a variety of scenes.

Natural Landscapes and Scenic Shots

On prompts like hot air balloons over Cappadocia, DeepSeek V4 produces solid, pleasant images. The lighting, shapes, and overall composition are good, and it’s clearly a major improvement over DeepSeek 3.2.

However, when placed side by side:

Opus 4.7 often delivers the most polished, cinematic results with better mood and overall “vibe.”
GPT 5.4 also tends to edge out DeepSeek in terms of visual appeal and coherence.
GLM 5.1 is frequently very close to DeepSeek, and sometimes slightly ahead.

DeepSeek V4 is competitive, but not clearly the best in this category.

3D and Voxel-Style Scenes

On voxel-style scenes like a Roman city, the jump from DeepSeek 3.2 to V4 is dramatic. DeepSeek V4’s generations are far more detailed, structured, and visually coherent than its previous version—this alone shows that the underlying pretraining and post-training have significantly improved.

Against other models, though:

Gemini 3.1 stands out with especially rich, well-structured 3D scenes.
GPT 5.4 and Opus 4.7 also tend to produce more refined, higher-quality voxel environments.

DeepSeek V4 is no longer “laughably behind” like 3.2 sometimes was, but it’s still a step below the very top models on 3D-style prompts.

Challenging Prompts: Golden Gate Bridge & Structural Consistency

On harder prompts like the Golden Gate Bridge, DeepSeek V4 struggles more. Some generations show odd sizing, strange geometry, or broken composition. To be fair, this is a tough prompt even for leading models:

Opus 4.7 and Gemini 3.1 also show minor structural issues, but usually keep the bridge recognizable and coherent.
DeepSeek V4’s outputs are more unstable, sometimes veering into “weird” rather than simply imperfect.

This highlights a recurring theme: DeepSeek V4 can be creative and diverse, but its structural consistency is not yet on par with the very best proprietary models.

Style, Diversity, and "Weirdness"

One interesting upside: DeepSeek V4 often feels stylistically different from models like Opus. For example, in underwater or fish-themed prompts, its images aren’t necessarily better, but they do have a distinct look.

That “weirdness” can sometimes be a bug (broken geometry, odd layouts), but it can also be a feature—especially for creative users who want more variety from open models rather than the same polished style every time.

SVGs and Icon-Style Graphics

SVG generation is a great way to test a model’s ability to handle structure, movement, and clean shapes. Here, DeepSeek V4 shows clear progress over 3.2 and outperforms many open models, but it’s not the absolute leader.

Across cycling icons, moving figures, and other SVG-based prompts:

DeepSeek V4 is now much better than its 3.2 predecessor and stronger than some open models like older Minimax versions or Muse Spark in many cases.
GLM 5.1 frequently looks as good or better, with cleaner lines and more coherent structure.
Gemini is especially strong here, though it may be partly optimized for SVG-style tasks.

Overall, DeepSeek V4 is now in the conversation for practical SVG work, but GLM and Gemini still feel more reliable and polished.

UI and Dashboard Design Prompts

To push models a bit out of their comfort zone, the tests also include UI and dashboard prompts—things like travel booking consoles, world’s fair sites, revenue dashboards, and retrofuturistic home automation interfaces.

Orbital Travel Booking Console

On a sci-fi booking console, DeepSeek V4 produces a decent, functional layout. It’s not bad, but it doesn’t strongly stand out.

In comparison:

GLM 5.1 offers an interesting hero element and a more visually engaging structure.
Opus 4.7 feels the most organized and polished overall, with a clear hierarchy and good spacing.
Muse Spark is serviceable but a bit cramped and misaligned.

Here, DeepSeek V4 lands in the middle of the pack: usable, but not the most creative or refined.

1907 World’s Fair Website

This prompt is great for testing whether a model can capture a historical aesthetic rather than just outputting a generic modern site.

Results:

DeepSeek V4 generates an okay, somewhat modern-feeling site that doesn’t fully lean into the vintage theme.
Opus 4.7 clearly understands the brief: more period-appropriate fonts, layout, and overall feel.
GLM 5.1 feels heavier and less refined here, while Muse Spark adds some nice touches but still feels too modern.

On this kind of aesthetic-sensitive prompt, DeepSeek V4 is competent but again not top-tier. It does, however, beat some competitors like Muse in overall feel.

Revenue Dashboards and Research Consoles

On more practical dashboards like “revenue recovery command center” or “deep sea research dashboard,” DeepSeek V4 tends to:

Produce busy, slightly misaligned layouts with bright colors and weaker hierarchy.
Feel less like a real, production-ready dashboard and more like a concept sketch.

By contrast:

Muse Spark surprisingly does quite well on revenue dashboards, with more realistic layouts and chart placement.
Gemini often outputs functional, if somewhat barebones, dashboards.
Opus usually captures the right vibe for research or control-center UIs, even if spacing isn’t perfect.

Here, DeepSeek V4 is usable but not a standout. It’s another area where stability and layout discipline could improve.

Retrofuturistic and Vertical Farm Interfaces

On more creative UI prompts, like a retrofuturist home automation OS or a vertical farm control panel, DeepSeek V4 shows more personality:

Its retrofuturist UI includes fun details like radio-style elements and tactile controls, but still feels less refined than the best outputs.
Opus and especially Gemini shine here, with tactile, imaginative interfaces that feel both creative and coherent.
On vertical farm dashboards, DeepSeek V4 and GLM 5.1 feel quite similar in quality and style, both reasonably good but not clearly superior to each other.

This is where DeepSeek’s creative “direction” feels promising, even if execution isn’t always perfect. For users who care about variety and experimentation, that could be a real plus.

Stability, Consistency, and Overall Ranking

Across all these tests, a few patterns emerge about DeepSeek V4 Pro (Thinking):

Massive Improvement Over DeepSeek 3.2

The jump from DeepSeek 3.2 to V4 is huge. Many of the older model’s outputs—especially on complex structures, SVGs, and 3D scenes—were simply unusable or “laughably bad.” V4, by contrast, is now firmly in the modern generation of capable models.

This strongly suggests a new pretrain and post-train stack, not just incremental tuning.

Still Behind the Very Top Models

When compared directly to today’s leaders, DeepSeek V4 generally:

Lags behind Opus 4.7, GPT 5.4, and Gemini 3.1 in overall polish, structural stability, and creative coherence.
Feels roughly on par with GLM 5.1 in many visual and UI tasks—sometimes slightly behind, sometimes comparable.
Beats or matches models like Muse Spark and some Minimax variants in several categories.

So rather than being “way ahead of everyone,” DeepSeek V4’s real achievement is that it has caught up to the current open and regional leaders, instead of being clearly behind.

Stability and "Second-Tier" Feel

One recurring weakness is stability. DeepSeek V4 occasionally produces generations that simply don’t “compile” visually—broken structures, odd geometry, or layouts that feel off. This is similar to what you often see in slightly second-tier models: they can be impressive when they hit, but they miss more often than the absolute top systems.

For production use, that means you may need more retries or manual selection compared to using something like Opus or Gemini.

What This Means for Open-Source AI

DeepSeek V4 is a big deal for the open ecosystem. It shows that open and Chinese labs can move fast enough to stay in the same conversation as frontier proprietary models, even if they’re not yet surpassing them.

Its current position feels similar to where other strong open models sit today—solid, competitive, and especially attractive for developers who value openness, customization, and diversity of outputs. If you’re interested in how open models are evolving more broadly, it’s worth comparing DeepSeek V4 to other major releases like Gemma 4 or powerful coding-focused models such as Kimi K2.6.

The big open question is cadence: will DeepSeek now move to faster, more frequent releases like some competitors (for example, Qwen’s rapid iteration), or will it stick to slower, larger jumps? If the team can iterate quickly from this V4 baseline, there’s a real chance they could not only keep up but start to outpace other open labs.

Should You Try DeepSeek V4?

If you care about benchmarks alone, DeepSeek V4 will likely look strong—but the real value comes from testing it yourself on your own prompts and workflows.

Based on these first impressions, DeepSeek V4 is worth trying if you:

Want a powerful open-source model that’s finally in the same league as GLM and other leading open systems.
Care about creative diversity and don’t mind the occasional odd or unstable output.
Are exploring multimodal tasks like image generation, SVGs, or UI concepts and want an open alternative to closed models.

Just don’t expect it to consistently beat Opus, GPT, or Gemini yet. Think of it as a major step forward that brings DeepSeek back into contention—now the real test will be how quickly they can build on top of this foundation.

Either way, the best way to understand any model is to use it. Try DeepSeek V4 on your own prompts, compare it against your current favorites, and see where it shines (or fails) for your specific use cases.