DeepSeek V4 vs GPT-5.5: ultra-cheap open weights and a new AI stack war

28 May 2026 18:37 35,941 views

DeepSeek V4 has arrived just hours after GPT‑5.5, bringing million‑token context windows, MIT‑licensed open weights, and shockingly low prices. Here’s what the new V4 Pro and V4 Flash models can actually do, how they stack up on benchmarks, and why they matter for both Nvidia and China’s domestic AI chips.

DeepSeek has fired back hard in the model wars. Just hours after GPT‑5.5 arrived, DeepSeek launched V4, a new open‑weight family that combines million‑token context windows, strong coding performance, and aggressive pricing. It is not the single best model in every category, but it is good enough, open enough, and cheap enough to seriously change how developers think about building AI systems.

Two Models, One Million Tokens of Context

The DeepSeek V4 family comes in two flavors: V4 Pro and V4 Flash. Both are text‑only models for now, both support a huge 1 million token context window, and both can generate up to 384,000 output tokens via the DeepSeek API. That scale makes them especially attractive for agents, codebase analysis, and large document workflows.

V4 Pro is the flagship model. Under the hood, it has 1.6 trillion total parameters, but only 49 billion are active for each inference pass. This is a mixture‑of‑experts (MoE) design: instead of waking up the entire model for every request, it activates only the relevant experts, improving efficiency and speed.

V4 Flash is the smaller, faster sibling with 284 billion total parameters and 13 billion active per request. It is designed for high‑throughput, everyday workloads like chat, summarization, routing, and lightweight agents, while still benefiting from the same long context window.

Pricing: A Direct Attack on Premium Models

The most disruptive part of DeepSeek V4 is not just the architecture—it is the pricing. DeepSeek is clearly targeting the high‑end closed models from OpenAI, Anthropic, and Google.

Here is how the pricing breaks down:

V4 Flash

• $0.14 per million input tokens
• $0.28 per million output tokens

V4 Pro

• $1.74 per million input tokens
• $3.48 per million output tokens

By comparison, GPT‑5.5 reportedly launched around $5 per million input tokens and $30 per million output tokens, with GPT‑5.5 Pro going as high as $30 input and $180 output. Claude Opus 4.7 is also in the premium tier, at roughly $5 input and $25 output per million tokens.

That is why you see claims like “V4 Pro is 98% cheaper than GPT‑5.5 Pro” and “V4 Flash is over 99% cheaper than Claude Opus 4.7 on output.” The exact percentages depend on which tiers you compare, but the direction is clear: DeepSeek is trying to make premium‑grade capabilities feel overpriced.

How Strong Is DeepSeek V4 Really?

On paper, V4 is a big leap over DeepSeek V3.2, especially for coding and agentic workloads. Multiple independent benchmarks show it landing near the top of both open and closed models, even if it does not win everything.

Independent Benchmarks

• Arena.ai Code Arena: DeepSeek V4 Pro in “thinking” mode ranked third among open‑source models and 14th overall, described as a significant jump over V3.2.

• Val’s AI Vibe Code benchmark: V4 reportedly became the number one open‑source weighted model, beating Kimiko 2.6 and even some closed models like Gemini 3.1 Pro. Val’s said V4 showed roughly a 10x improvement over V3.2 on this benchmark.

• Val’s broader index: V4 came second overall, just 0.07% behind Kimiko 2.6, which puts it firmly in top‑tier territory for open weights.

DeepSeek’s Own Positioning

Interestingly, DeepSeek’s own messaging is more cautious than the hype. The company says V4 Pro has surpassed mainstream open‑source models and is close to closed‑source systems like Gemini in knowledge and reasoning, but still lags the very best frontier models by about three to six months.

In other words: V4 is extremely competitive in code, agents, math, and STEM, and often ahead in those areas, but the absolute top closed models still hold an edge in general reasoning and some expert knowledge tasks.

Key Benchmark Numbers

Some specific scores help show where V4 shines and where it trails:

• Codeforces: V4 Pro scored 3,206, roughly equivalent to 23rd place among human competitive programmers on the platform.

• Apex Shortlist (math & STEM): 90.2%, beating Claude Opus 4.6 at 85.9% and GPT‑5.4 at 78.1%.

• SWE‑Verify (GitHub issue resolution): 80.6%, matching Claude Opus 4.6.

• MMLU Pro (broad knowledge): Gemini 3.1 Pro scores 91.0%, while V4 Pro hits 87.5%.

• GPQA Diamond (expert‑level QA): Gemini 3.1 Pro at 94.3 vs. V4 Pro at 90.1.

• Humanity’s Last Exam: Gemini 3.1 Pro reaches 44.4%, V4 Pro scores 37.7%.

The pattern: V4 is outstanding for coding, math, STEM, and agentic workflows, and competitive but not dominant in broad reasoning and knowledge benchmarks.

Engineering Upgrades Under the Hood

DeepSeek V4 is not just a bigger MoE model; it also introduces several engineering tweaks aimed at stability and efficiency.

• MHC (Manifold Constrained Hyperconnection): An upgrade to traditional residual connections designed to keep signal propagation more stable through very deep networks. This helps training and inference stay robust as models scale.

• Muon optimizer: A new optimizer that replaces AdamW for large‑scale MoE and low‑precision training. It is tuned specifically for the kind of sparse, expert‑based architecture V4 uses.

DeepSeek claims that with full engineering optimization, these changes can deliver almost 2x faster inference. Internally, the company says V4 has already become its main agentic coding model for employees, which is a strong signal of where they think its strengths lie.

Hardware: Nvidia vs China’s Domestic Chips

One of the most important aspects of V4 is not just the model itself, but where it runs. DeepSeek is clearly positioning V4 as a bridge between Nvidia’s ecosystem and China’s growing domestic AI hardware stack.

Deep Integration with Nvidia

On launch day, Nvidia announced support for DeepSeek V4 on its latest hardware:

• GPU‑accelerated endpoints on build.nvidia.com
• NIM deployment support
• VLLM recipes
• SG Lang serving recipes for Blackwell and Hopper systems

Nvidia reported that V4 Pro on GB200 NVL72 delivered over 150 tokens per second per user in early out‑of‑the‑box tests. They also tested the Blackwell B300 using VLLM’s day‑zero recipe and the model’s native MXFP4 format.

The message from Nvidia is clear: even if DeepSeek is part of China’s AI rise, Nvidia wants developers to run it on Blackwell, Hopper, CUDA, NIM, and the rest of its stack.

Growing Support for Chinese NPUs

At the same time, V4 is a major step toward a parallel, China‑centric AI infrastructure. DeepSeek has verified fine‑grained expert parallel optimization on Huawei Ascend NPU platforms, reporting 1.50x to 1.73x acceleration on general inference workloads. Huawei also says its Ascend SuperNode products, based on the Ascend 950 series, will support DeepSeek V4.

This matters because US export controls have restricted high‑end Nvidia chips to China since 2022. The original goal was to slow Chinese AI progress. Instead, DeepSeek is showing that these constraints are pushing Chinese labs to optimize harder, lean into domestic chips, and prioritize models that are cheaper to run.

It is not a full break from Nvidia yet. Reporting from MIT Technology Review and comments from Tsinghua Professor Liu Zhiyuan suggest that Chinese chips are used mainly for inference, while parts of training still rely heavily on Nvidia hardware. Some long‑context features may not yet be fully adapted to domestic NPUs, and Nvidia GPUs are still generally stronger for training. But V4 is an early proof that a parallel AI stack is emerging.

For a deeper dive into this hardware and geopolitics angle, see this breakdown of what DeepSeek V4 really proves about China and Nvidia.

What This Means for Developers and Enterprises

V4’s combination of long context, strong coding, and low prices has very different implications depending on who you are.

Enterprise Use Cases

For large organizations, V4 Pro changes the economics of large‑scale AI workflows. With 1 million tokens of context and pricing at $1.74 input and $3.48 output per million tokens, it becomes much more affordable to:

• Run legal and compliance reviews across massive document sets
• Analyze large financial reports and research archives
• Index and reason over entire codebases
• Automate support, knowledge management, and internal Q&A
• Build complex internal agents that reason across many systems

DeepSeek has also tied future pricing to hardware availability. The company says V4 Pro throughput is currently limited by high‑end compute constraints, but prices could fall further once Huawei Ascend 950 SuperNodes ship at scale in the second half of 2026. That suggests today’s already‑low prices may not be the floor.

Solo Developers and Small Teams

For smaller teams, V4 Flash might be the star. At $0.14 per million input tokens and $0.28 per million output tokens, it becomes extremely cheap to build:

• Chatbots and assistants
• Summarization and routing services
• Coding copilots and review tools
• Lightweight agents that orchestrate APIs and tools

Because both models are MIT‑licensed and available as open weights on platforms like Hugging Face, teams can download, modify, and self‑host them. That means you are not locked into a single API provider—you can customize, fine‑tune, and deploy V4 on your own infrastructure if you have the hardware.

If you want a more direct comparison with GPT‑5.5 and how these pricing and performance trade‑offs play out, check out our detailed GPT‑5.5 vs DeepSeek V4 benchmark and compute war analysis.

Limitations and Real‑World Experience

Despite the excitement, V4 is not perfect—and it is important to understand its current limits.

• Text‑only for now: V4 currently supports text and code, but not images, audio, or video. That means OpenAI, Google, Xiaomi, and others still have a clear lead in multimodal systems. Xiaomi’s Mimo V2.5 Pro, for example, already supports text, image, audio, and video, and OpenAI and Google are heavily focused on multimodal agents. DeepSeek says multimodal support is coming, but it is not here yet.

• Mixed user impressions: Early reactions from users are split. Some say V4 Flash feels close to GPT‑5.4‑level capability at a tiny fraction of the cost. Others report that in everyday chat and messy real‑world prompts, V4 Flash does not feel dramatically better than the mature V3.2 models.

This gap between benchmarks and daily experience is common. Benchmarks show what a model can do under controlled conditions; real usage reveals how it handles vague instructions, long conversations, and personal workflows. V4 may be excellent for code and agents while still feeling uneven as a general‑purpose chat partner in some scenarios.

DeepSeek is also retiring its older DeepSeek Chat and DeepSeek Reasoner endpoints on July 24, 2026. For now, those endpoints already route to V4 Flash in both non‑thinking and thinking modes, so many existing API users are effectively on V4 without necessarily treating it as a new model.

The Bigger Picture: Pricing, Open Weights, and the AI Stack

DeepSeek V4 is not just another model release. It is a coordinated move across pricing, open‑weight access, long‑context engineering, and hardware strategy.

• OpenAI still leads in several frontier benchmarks and multimodal capabilities.

• Gemini 3.1 Pro still tops some reasoning and expert knowledge tests.

• Claude remains strong in long‑context retrieval and premium coding workflows.

But V4 makes the performance gap feel smaller while making the cost gap feel enormous. When developers can build serious agents with a 1 million token context window, strong coding ability, open weights, and Pro‑tier output pricing under $4 per million tokens, the question shifts from “What is the absolute best model?” to “What is good enough for my use case at a sane price?”

R1’s launch in early 2025 triggered a shockwave that even hit Nvidia’s stock. V4 may not cause the same instant market panic, but for developers, startups, and enterprise AI teams, it could end up being one of the most practically important releases of the year.