Gemini 3.5 X-high, MiniMax M3, DeepSwe, new Claude features, MiMO 2.5 cuts, and more

10 Jun 2026 04:37 31,191 views

June is shaping up to be a huge month for AI: Google is preparing a higher-reasoning Gemini 3.5 Pro and a real-time Gemini Live, MiniMax is teasing a sparse-attention M3 model, MiMO 2.5 just slashed prices, and new benchmarks and Claude features are reshaping coding and agent workflows.

AI development is accelerating again, with June lining up to deliver major model launches, big price cuts, and new tools for coding, agents, and even humanoid robots. From Google’s next Gemini update to MiniMax’s sparse-attention M3 and MiMO’s massive pricing overhaul, the landscape is shifting fast.

Gemini 3.5 Pro X-high and real-time Gemini Live

Google appears to be preparing a new "X-high thinking" mode for Gemini 3.5 Pro, building on the model announced at Google I/O. This mode looks similar to the higher reasoning-effort options we’ve seen from OpenAI and Anthropic, where the model spends more compute on complex problems to improve reasoning depth.

This is especially important for Gemini, which has previously struggled with long-horizon, multi-step tasks. An X-high mode suggests Google is directly targeting those weaknesses, making Gemini more reliable for planning, complex coding, and analytical workflows.

On top of that, backend flags hint at a new model called Gemini 3.1 Flash Live VR EAP. The naming suggests:

• Live: real-time interaction, likely multimodal (voice, text, possibly video)
• VR: potential immersive or spatial/voice experiences
• EAP: early access or experimental preview

If this pans out, Gemini Live could move much closer to a real-time AI assistant that can see, hear, and respond instantly, with possible voice cloning capabilities layered on top.

MiniMax M3 and a new sparse attention architecture

MiniMax is teasing its upcoming M3 model, expected to arrive around June, and the most interesting detail so far is its new sparse attention architecture designed for long context.

Instead of fully processing every token in a huge context window, the model first does a lightweight scan over the entire input, then focuses heavy reasoning only on the most relevant chunks. It’s similar to how you’d skim a textbook’s table of contents before deciding which pages to read carefully.

Reported benefits include:

• Up to 10× faster context processing
• Around 15× faster decoding speeds
• Much lower compute requirements for long-context tasks

Technically, MiniMax’s approach is said to:

• Use GQA (Grouped Query Attention) instead of MLA
• Apply block-level selection similar to DeepSeek V4’s CSA
• Perform attention directly on the real KV cache instead of compressed dimensions

That last point matters: by operating on the full key–value cache, the model can preserve more contextual detail while still getting the efficiency gains of sparse attention. For long-context work—like large codebases, legal documents, or multi-step research—this could be a big deal.

If you’re following the Chinese model ecosystem, this fits into the broader competition between players like DeepSeek, Qwen, MiniMax, and MiMO. For a deeper comparison of these labs and their models, see this breakdown of six leading Chinese AI models.

New Claude Lab features: tunes, squares, bitboard, and Claude spaces

Anthropic appears to be preparing four new Claude Lab features, based on leaked backend flags with the codenames:

• tunes
• squares
• bitboard
• Claude spaces

While details aren’t confirmed, early hints suggest Anthropic is moving Claude beyond a simple chat or coding interface and toward a broader productivity and agent ecosystem.

These features likely touch on:

• Collaborative workspaces
• Persistent agent environments
• Organization and project systems
• Customizable workflows that can be shared across teams

Claude spaces is especially interesting. The name suggests a persistent operating environment where Claude-powered agents can maintain state, memory, and tools across sessions, rather than starting from scratch in each chat. That aligns with the industry’s push toward long-running agents that can handle ongoing projects instead of one-off prompts.

MiMO 2.5 Pro slashes prices and boosts tokens

Xiaomi’s MiMO stack just received a major upgrade with the MiMO 2.5 series. The headline changes are aggressive:

• API costs reportedly reduced by up to 99%
• 5–8× more usable tokens on existing plans
• Simpler, unified context pricing

One of the biggest shifts is that MiMO 2.5 Pro is now priced roughly on par with DeepSeek V4 Pro. That dramatically changes the value equation in the Chinese model ecosystem and intensifies the ongoing price war in AI APIs.

MiMO’s team attributes these cuts to major inference optimizations and serving efficiency improvements across their stack. They also confirmed that MiMO 2.5 TTS (text-to-speech) will remain free for a limited time, making it attractive for developers building voice-enabled apps.

If you’re weighing DeepSeek, MiMO, MiniMax, and others for production use, it’s worth looking at how these price and performance shifts compare to recent DeepSeek advances like V4 and GPT-5.5-style models. For more context on that competition, check out this analysis of DeepSeek V4 versus GPT-5.5.

DeepSwe: a new agentic coding benchmark

A new benchmark called DeepSwe (positioned as a successor to SweeBench-style tests) has launched with a focus on more realistic software engineering tasks.

Key differences from older benchmarks:

• Tasks are built from scratch instead of scraped from GitHub issues or PRs, reducing the risk of models having seen them during training.
• It emphasizes long-horizon engineering work that mirrors real developer workflows, not just short, isolated bug fixes.

Early results reportedly show an upcoming GPT-5.5 scoring around 70% on DeepSwe, which is notable given the benchmark’s higher difficulty and focus on end-to-end coding tasks. It’s another sign that frontier models are getting much better at acting as full-stack coding assistants rather than autocomplete tools.

Qwen 3.7 Max climbs coding leaderboards

Alibaba’s Qwen 3.7 Max has debuted at number four on the CodeArena leaderboard, with particularly strong performance in:

• Frontend development tasks
• Backend logic and reasoning-heavy coding problems

It is now the highest-ranked Chinese lab on that leaderboard, reportedly surpassing models like GLM 5.1 and DeepSeek V4 in several scenarios. In some agentic web development tasks, Qwen 3.7 Max even outperforms Claude Opus 4.6, which is impressive given how strong Opus has been for coding.

New security guidance plugin for Claude Code

Anthropic has shipped a new security guidance plugin for Claude Code. It’s available through the Claude Code plugin marketplace (via /plugins) and is designed to:

• Flag vulnerabilities as you write code
• Suggest fixes and safer patterns
• Support real-time debugging, auditing, and security analysis

For teams using Claude as a coding assistant, this moves it closer to a security-aware pair programmer that can catch issues before they reach production.

React Doctor: an open-source agent skill for React code

Outside of the major labs, a new open-source agent skill called React Doctor is aiming to clean up messy React codebases automatically.

React Doctor focuses on:

• Identifying unnecessary re-renders
• Improving state management patterns
• Cleaning up tangled component architecture

It’s built to plug into agentic workflows so an AI assistant can not only write React code, but also refactor and optimize it as the project grows.

Humanoid robots move into real deployments

On the hardware side, Figure AI has announced a major commercial agreement with Catalyst Brands to deploy its humanoid robots at scale across their operations. Catalyst owns brands like JCPenney, Aéropostale, and Brooks Brothers, with the first rollout planned in Reno, Nevada.

Figure’s Figure 1 robots will be used in logistics and warehouse-style workflows. This marks a shift from viral demo videos to real-world, large-scale deployments, where the key question becomes: can humanoid robots economically replace repetitive warehouse and logistics labor?

If these pilots prove reliable and cost-effective, humanoid robotics could become one of the biggest new industries created by the AI boom over the next decade—bringing both huge opportunities and serious questions about the future of work.