DeepSeek V4: state-of-the-art coding at a fraction of the cost

03 Jun 2026 20:37 8,413 views

DeepSeek V4 Pro and Flash are new open models that rival GPT‑5.5 and Claude Opus 4.7 on coding, reasoning, and agent benchmarks while costing far less. This guide walks through what DeepSeek is, how V4 performs, pricing, and how to plug it into the Pi agent for real-world coding.

DeepSeek is back with another big release: DeepSeek V4 Pro and DeepSeek V4 Flash. These new models are pushing open-source AI even closer to (and in some cases past) the best closed models from OpenAI and Anthropic—while staying dramatically cheaper.

What is DeepSeek and why does it matter?

DeepSeek is a leading AI company from China that has become one of the main drivers of the open-source large language model movement. Their breakthrough moment came in early 2025 with the release of DeepSeek R1, often called the “DeepSeek moment.”

R1 showed two important things to the AI world:

First, open-source models can compete directly with the best closed models. Second, you don’t need OpenAI- or Anthropic-level budgets to train extremely capable systems. That release kicked off a wave of open-source AI projects and companies that are now serious alternatives to proprietary models.

DeepSeek V4 Pro and V4 Flash are the latest step in that evolution.

DeepSeek V4 Pro and Flash at a glance

DeepSeek V4 Pro and V4 Flash are large mixture-of-experts (MoE) models. In an MoE architecture, the model contains many parameters, but only a subset of them are activated for each request. This lets the model stay very large and capable while remaining fast and efficient in practice.

Key points about the new models:

• V4 Pro is the flagship, high-capability model aimed at coding, reasoning, and complex tool use.
• V4 Flash is the faster, lighter variant optimized for speed and cost.
• Both are available through DeepSeek’s own API and via providers like Hugging Face.
• They support very large context windows (up to around 1M tokens for Pro in the example setup), making them suitable for big codebases and long research workflows.

DeepSeek also offers a ChatGPT-style web interface, but the real power comes from using these models via API or inside your own agents and tools.

How DeepSeek V4 performs on coding benchmarks

To understand how good DeepSeek V4 really is, it helps to look at benchmarks. The focus here is on three areas: coding, general reasoning, and agentic (tool-using) performance.

TerminalBench: working in the command line

TerminalBench measures how well a model can use a terminal or command line to solve tasks. This is crucial for coding agents, because the terminal is how they interact with your file system, run commands, and manage projects.

On TerminalBench, DeepSeek V4 Pro:

• Performs on par with top models like Claude Opus 4.7 and Gemini 3.1 Pro.
• Beats strong open models like Kimi K 2.6, GLM 5.1, and Minimax M2.

This means V4 Pro is already a serious option for terminal-driven coding agents.

SWE-bench Pro: solving real GitHub issues

SWE-bench Pro evaluates how well a model can fix real-world GitHub issues. The model must understand complex repositories, reason across multiple files, and edit code correctly.

On SWE-bench Pro, DeepSeek V4 Pro:

• Scores better than GPT‑5.4.
• Is just below GPT‑5.5 and Claude Opus 4.7.
• Beats Gemini 3.1 Pro and is roughly on par with Kimi, GLM, and Minimax.

For a model that’s open and much cheaper, that’s an impressive result—and a strong signal that V4 is ready for serious software engineering work.

General reasoning: more than just a coding model

DeepSeek V4 isn’t only about code. It also performs well on tough general reasoning benchmarks that cover many academic and professional domains.

GPQA Diamond and MMLU Pro

GPQA Diamond and MMLU Pro are collections of expert-written questions across subjects like physics, chemistry, biology, and more. They’re designed to test deep understanding, not just surface-level pattern matching.

On these benchmarks, DeepSeek V4:

• Lands just below the very best closed models.
• Performs similarly to strong open models like Kimi K 2.6.

This makes V4 a solid choice for research assistance, technical Q&A, and learning support across many disciplines.

Humanity’s Last Exam

Humanity’s Last Exam is a particularly challenging benchmark that combines questions from top experts across classics, ecology, mathematics, computer science, linguistics, chemistry, and more. Models generally perform much worse here than on typical benchmarks.

DeepSeek V4 scores around 37.7—slightly above Kimi, but below the strongest closed-source models. While it doesn’t dominate this benchmark, it still delivers competitive performance for an open model.

Agentic performance: tools, browsing, and multi-step workflows

Modern AI agents don’t just answer questions—they call tools, browse the web, and chain actions together. DeepSeek V4 was evaluated on several agent-focused benchmarks.

MCP Atlas and Toolathlon: using tools effectively

MCP Atlas and Toolathlon measure how well a model can use and chain tools to solve complex tasks. This includes choosing the right tool, calling it correctly, and integrating the results into a coherent solution.

On these benchmarks, DeepSeek V4:

• Outperforms GPT‑5.4, Gemini 3.1 Pro, Kimi K 2.6, and GLM 5.1.
• Is beaten only by GPT‑5.5 and Claude Opus 4.7.

That places V4 firmly in state-of-the-art territory for agent-style workflows.

BrowseCom: web research and multi-page browsing

BrowseCom tests how well a model can perform research by browsing multiple websites, making several calls, and synthesizing information.

On BrowseCom, DeepSeek V4:

• Scores close to GPT‑5.5, GPT‑5.4, and Gemini 3.1 Pro.
• Even edges out Claude Opus 4.7.
• Remains competitive with open models like Kimi and GLM.

If you’re building research assistants or browsing agents, V4 is clearly capable enough to be your main engine.

How DeepSeek V4 compares overall

Putting all the benchmarks together, DeepSeek V4 Pro is effectively a state-of-the-art model:

• Often better than GPT‑5.4.
• Slightly behind GPT‑5.5 and Claude Opus 4.7 on some tasks.
• Competitive with or ahead of other strong open models like Kimi, GLM, and Minimax.

For most real-world coding and agentic use cases, the performance gap vs. the very top closed models is small—while the cost gap is huge.

Pricing: SOTA performance at a tiny fraction of the cost

Pricing is where DeepSeek V4 really stands out.

DeepSeek V4 Pro starts at roughly:

• $1.74 per million input tokens
• $3.48 per million output tokens

That’s dramatically cheaper than GPT‑5.5, Claude Opus 4.7, and Gemini 3.1 Pro, while still operating in the same performance tier.

On top of that, DeepSeek has been running promotions (for example, around 75% off tokens up to a specific date), making it even more affordable to experiment and migrate workloads.

In practice, this means you can often run DeepSeek V4 for around 10% of the cost of top closed models, especially at scale. For teams running heavy coding agents or research assistants, that’s a massive cost saving.

If you’re interested in more ways to cut costs by mixing DeepSeek with other tools, you may also want to look at using DeepSeek V4 to make Claude-based coding cheaper or running DeepSeek V4 Pro & Flash for free via NVIDIA NIM.

Using DeepSeek V4 with the Pi agent

To see DeepSeek V4 in action as a coding agent, one simple approach is to plug it into the Pi agent harness.

What is the Pi agent harness?

The Pi agent harness is a minimalist agent framework you can use as a coding assistant or as an SDK to add agent capabilities to your own apps. It’s intentionally lightweight: instead of shipping with every feature, it’s designed so the agent can extend itself by writing new tools and behaviors.

Pi is also used under the hood by larger projects like OpenClo, which rely on it as their agent harness.

Installing Pi

Installation is straightforward:

• Install Pi via the command line using the provided install command (usually a single copy–paste).
• After installation, you can run it with commands like pi or openpi.

Once installed, Pi can act as your coding assistant and as a base for more sophisticated agents.

Connecting DeepSeek V4 via API

There are two main ways to use DeepSeek V4 with Pi:

Option 1: DeepSeek’s official API

1. Go to DeepSeek’s website and create an account.
2. Top up your account with some credit.
3. Generate an API key from the dashboard.
4. In Pi, run a login command (e.g., /login) and choose to log in with an API key.
5. Select DeepSeek as the provider and choose the V4 model you want to use.

Option 2: Using Hugging Face as a provider

If you prefer to centralize multiple models via Hugging Face, you can:

1. Go to hf.co and create or log into your account.
2. In your account settings, create an access token with permission to call inference providers.
3. In Pi, run /login and choose Hugging Face, then paste your token.
4. Use /model inside Pi to search for DeepSeek V4 (or configure it manually if it’s not yet listed).

Adding DeepSeek V4 as a custom model in Pi

If DeepSeek V4 doesn’t show up automatically, you can manually add it:

1. Go to your user home directory and open .pi/agent/models.json.
2. Initialize a Hugging Face provider entry with the correct base URL and headers (optionally billing to an organization instead of a personal profile).
3. Add a new model entry for DeepSeek V4 with:
• The model ID from Hugging Face or the inference provider.
• The provider name (matching your provider config).
• A human-readable name (e.g., DeepSeek V4 Pro).
• The context window size (e.g., up to 1M tokens for Pro).

Once saved, restart or reopen Pi, run /model, and you should see your new DeepSeek V4 Pro model available for selection.

Example: building a browser-based code playground

To test DeepSeek V4 Pro as a coding agent, you can ask Pi (backed by V4) to build a small web app. One example is a browser-based code playground similar to CodeSandbox:

• Three editors: HTML, CSS, and JavaScript.
• A live preview pane that updates as you type.
• Auto-save so the code persists across sessions.

With a detailed prompt describing the layout and behavior, DeepSeek V4 Pro can:

• Generate the full front-end code for the playground.
• Wire up reactive updates so changes in the editors immediately reflect in the preview.
• Implement reset buttons and simple state management.

In practice, the model can produce a working prototype in just a few minutes. While this is not a full benchmark, it’s a good illustration of how capable V4 is as a coding assistant when paired with a harness like Pi.

Why DeepSeek V4 is a big deal for developers

DeepSeek V4 Pro and Flash show how quickly open models are catching up to, and in some areas matching, the best closed systems:

• Performance: Comparable to GPT‑5.5 and Claude Opus 4.7 on many coding and agent tasks.
• Cost: Often around 10% of the price of top closed models, especially with discounts.
• Flexibility: Available via multiple providers, with huge context windows and strong tool-use capabilities.

If you’re currently running GPT‑5.5 or Opus 4.7 through an API—for a coding assistant, internal tools, or customer-facing apps—DeepSeek V4 is a serious alternative that can dramatically cut your costs while keeping quality high.

As open-source and open-access models continue to improve at this pace, the balance of power between closed and open AI ecosystems is shifting fast. DeepSeek V4 is one of the clearest signs of that shift so far.