DeepSeek V4, OpenCode, and Superpowers: why this open-source stack feels insane

25 May 2026 18:37 13,027 views

DeepSeek V4 Flash and Pro have landed with a 1M-token context window and serious potential for open-source AI. Here’s how they perform in real coding workflows with OpenCode and Superpowers, why expectations might be off, and what this means for the next wave of Chinese and open-source models.

DeepSeek V4 just dropped, and it’s already reshaping expectations around open-source AI models. With both a Flash and Pro variant, a massive 1 million token context window, and strong early performance in real coding workflows, this release feels less like a one-off model and more like a new foundation for the next wave of open-source and Chinese AI systems.

What’s New in DeepSeek V4?

DeepSeek V4 comes in two main flavors: Flash and Pro. Flash is the cheaper, faster option aimed at everyday workloads, while Pro is the heavier model designed for more demanding tasks. Both are huge under the hood, with the Pro model reportedly around 284 billion parameters, putting it well beyond what most people could ever run locally.

The standout feature is the 1 million token context window. That’s a big jump from many existing Chinese and open-source models, which often cap out around 128K–256K tokens. For comparison, earlier DeepSeek releases and popular Chinese models like Kimi 2.6 typically sat in that lower range.

This context boost matters because it effectively sets a new bar for what future models—especially those built on top of DeepSeek—can do. As explored in our overview of DeepSeek V4 as a 1M-token open-source powerhouse, this isn’t just about one model being slightly better; it’s about raising the baseline for the entire ecosystem.

Flash vs Pro: Real-World Behavior, Not Benchmarks

On paper, benchmarks around DeepSeek V4 have been mixed, and there’s growing skepticism about benchmarks in general. When some proprietary models claim to outperform top-tier systems like Claude Opus based on selective scores, it’s hard to take leaderboard numbers at face value.

Instead of chasing benchmarks, the focus here is on how DeepSeek V4 behaves in real workflows—especially coding and full-stack generation—using tools like OpenCode and the Superpowers plugin. That’s where the model’s strengths and weaknesses show up clearly.

Testing DeepSeek V4 Flash with OpenCode

The first round of testing focused on DeepSeek V4 Flash, wired into an OpenCode environment via OpenRouter and the official DeepSeek API. The idea was to see how well the model could handle practical developer tasks, not just isolated prompts.

Early Friction: Prompting Gaps and Incomplete Runs

Flash showed impressive speed right away, but there were issues with reliability. In some OpenCode runs, the model would start working, inspect directories, and plan steps, only to stall or stop mid-build. This looked less like a raw intelligence problem and more like a “prompting gap” issue—where the system prompts inside OpenCode weren’t fully tuned for DeepSeek’s behavior.

Because of that, Flash felt borderline unusable for longer, tool-heavy workflows in that specific setup. It’s a reminder that model quality and tool integration quality are two different things. A strong model can still misfire if the surrounding prompts and orchestration aren’t aligned.

Simple Tasks: SVGs and One-Pagers

When moved away from complex agent-style runs and asked to handle more direct tasks, Flash looked much better. For example, generating an SVG birthday icon and a simple one-page HTML/CSS layout produced clean, usable output. The model also did something many cheaper models don’t: it asked clarifying questions and used placeholders intelligently, which is a good sign of training quality.

Cost-wise, Flash was extremely cheap to run, barely nudging the API balance even across multiple tests. That positions it as a potential competitor to other budget-friendly models like Gemini Flash for many everyday coding and content tasks.

DeepSeek V4 Pro: Closer to Top-Tier Assistants

DeepSeek V4 Pro had some rate-limiting issues on OpenRouter, so the more stable tests came via the official DeepSeek API. Once it was running properly, the difference in capability compared to Flash was obvious.

Full-Stack Generation with Superpowers

Hooked up to the Superpowers plugin (an agent-like system that can plan, create files, and build projects), V4 Pro was tasked with generating complete front-end projects: HTML, CSS, JavaScript, SVG assets, and SEO elements. The technical builds it produced were surprisingly strong for an open-source-aligned model.

The generated landing pages weren’t just functional—they were well-structured, visually coherent, and comparable in quality to what you’d expect from premium proprietary assistants. In some cases, the overall feel was described as “indistinguishable from Opus” for this specific type of front-end work.

It wasn’t perfect: SVGs could be basic, and not every design was mind-blowing. But the combination of structure, responsiveness, and SEO-aware markup was impressive, especially considering the cost and open nature of the model. As covered in more depth in our dedicated DeepSeek V4 Pro test, this is one of the strongest real-world showings yet from a model in this category.

3D and Game Logic: Clear Weak Spots

When pushed into more complex interactive tasks—like building a 3D FPS shooter using HTML, CSS, JavaScript, and Three.js from a single short prompt—V4 struggled. Movement logic was off, controls behaved strangely, and the overall experience was poor.

This isn’t unique to DeepSeek; many models fail hard on non-trivial game logic and 3D frameworks when given vague instructions. But it’s worth noting: V4 Pro shines in structured web and app generation, not in advanced game dev or intricate physics-heavy code from minimal specs.

Why This Release Still Matters Even If It’s “Not Perfect”

Some early reactions around DeepSeek V4 have been lukewarm, especially from people expecting it to crush every benchmark or instantly replace top proprietary models. That expectation misses the bigger picture.

First, V4 is a huge step up from previous DeepSeek generations in terms of behavior, context window, and real-world coding performance. Second, there’s a strong suspicion in the community that many of the best recent Chinese and open-source models—like Kimi 2.6 and others—are built on or heavily influenced by DeepSeek’s work.

If that’s true, then V4 isn’t just another model; it’s a new base layer. Future open-source releases and Chinese models can now inherit a 1M-token context, better reasoning patterns, and stronger code generation capabilities. That’s a massive win for anyone who cares about open ecosystems and alternatives to US-centric AI dominance.

Data, Privacy, and the China Question

Using DeepSeek means sending your data to a Chinese provider, and that understandably makes some people nervous. But the broader point raised in this context is that many users don’t really want to send their data to any big AI company—US, Chinese, or otherwise. At the same time, most of us are already deeply entangled with multiple platforms.

Whether you’re comfortable using DeepSeek’s official API is a personal decision. For developers who prioritize cost, openness, and model capability over jurisdiction, V4 is going to be very tempting. For those with stricter compliance or data residency requirements, it may stay in the experimental or side-project bucket for now.

What This Means for Open-Source and Local Models

DeepSeek V4 Pro is far too large to run on a typical local machine, but its existence still matters a lot for local-first AI. High-end open models like this often become the teacher or base for distilled, smaller models that can run on consumer hardware.

We’re already seeing this pattern with models like Qwen 27B, which punches far above its weight for its size and cost. If future Qwen-style or similar models are trained or fine-tuned on top of V4-era foundations, we can expect a new wave of small, fast, and surprisingly capable local models—ones that might run on a mid-range desktop or a high-end laptop while still delivering strong reasoning and coding support.

Bottom Line: A Big Win for the Open AI Stack

DeepSeek V4 Flash and Pro are not flawless, and they’re not magic. Flash can be finicky in some agent setups, and Pro won’t replace every premium closed-source assistant overnight. But taken together—with a 1M-token context window, strong front-end and full-stack generation, and clear improvements over earlier generations—they represent a major step forward for open and semi-open AI.

If you care about cheaper, more accessible, and less US-centric AI options, DeepSeek V4 is a release to pay attention to. The real story might not be this model alone, but the next generation of tools, open-source models, and local systems that get built on top of it.