AI Weekly: Claude Mythos Shockwave, New Open-Source Coding Beast, and the Best Video Model You Can Actually Use

12 May 2026 06:37 122,840 views

This week in AI brought a ‘too powerful to release’ security model from Anthropic, a surprisingly strong open-source coder from Zhipu AI, Meta’s new Muse Spark model, and major upgrades across Gemini, OpenAI, Anthropic agents, video generation, avatars, and more.

The AI world did not slow down this week. We saw a security-focused model that Anthropic says is too dangerous to release, a new open-source coding model that rivals the very best, a strong new entry from Meta, and big upgrades across video, avatars, and everyday AI tools.

Claude Mythos & Project Glasswing: The Model That Scared Anthropic

The biggest story this week is Anthropic’s unreleased model, Claude Mythos. According to Anthropic’s own system card, Mythos is a frontier model with extreme coding and cybersecurity capabilities. It is so strong at finding vulnerabilities that Anthropic decided not to release it publicly.

On cybersecurity benchmarks, Mythos preview hits around 83% on vulnerability reproduction, far above Anthropic’s previous Claude Opus 4.6. On software engineering tests like SWE-bench Pro and TerminalBench, it outperforms Opus by large margins, making it arguably the best coding model ever tested.

More worrying are the real-world findings. Anthropic reports that Mythos:

• Discovered a 27-year-old vulnerability in OpenBSD, one of the most security-focused operating systems in the world.
• Found a 16-year-old vulnerability in FFmpeg, a core component used in countless video tools and apps.
• Chained together multiple vulnerabilities in the Linux kernel, which underpins most of the world’s servers.

The implication: a powerful code model naturally becomes a powerful hacking assistant. Anthropic says they did not train Mythos specifically for cyberattacks; they trained it to be excellent at code, and advanced cyber capabilities emerged as a side effect.

Why Anthropic Isn’t Releasing Mythos

Instead of a public launch, Anthropic created Project Glasswing. Under Glasswing, a small set of major companies (think big tech, security, infrastructure) get controlled access to Mythos. Only vetted cybersecurity specialists inside those organizations can use it.

The goal is defensive: let trusted companies use Mythos to find and patch vulnerabilities in their own products before similar capabilities become widely available to attackers. Given that almost everyone uses software from these companies, hardening their systems ahead of time could significantly reduce global risk.

There is always some marketing upside to saying “our model is too powerful to release” – we saw similar narratives around GPT‑2 back in 2019. But in this case, the combination of benchmarks, real vulnerabilities, and the 245-page system card suggests Anthropic is genuinely concerned about the security implications.

If you want a separate deep dive focused only on Mythos and its safety story, you can check out what we actually know about Anthropic’s new ‘too powerful’ Claude Mythos model.

Muse Spark and GLM 5.1: Two Very Different, Very Important New Models

Muse Spark: Meta’s New Closed Frontier Model

Meta is back with a new flagship model called Muse Spark, the first major release from its Super Intelligence Labs team (the group formed after bringing in Alexander Wang and other high-profile hires). Unlike the Llama series, Muse Spark is not open source.

On benchmarks, Muse Spark is competitive with top models like GPT 5.4, Claude Opus 4.6, Gemini 3.1 Pro, and Grok 4.2, but it is not clearly the best overall:

• Vision and figure understanding: very strong, often beating others by a noticeable margin.
• Multimodal reasoning: solid, usually middle of the pack.
• Coding (SWE-bench, SWE-bench Pro, TerminalBench): slightly behind Opus, GPT 5.4, and Gemini 3.1; roughly on par with Grok 4.2.
• Health-related benchmarks: best-in-class on some hard open-ended medical queries, mid-pack on others.

The standout advantage is token efficiency. Muse Spark uses fewer tokens to achieve similar results, which should translate into lower costs when Meta opens up wider API access. For a first release from this new lab, jumping from Llama 4 Maverick-level performance to near state-of-the-art is impressive.

You can try Muse Spark through the Meta AI interface today, with a private API preview and broader rollout planned.

GLM 5.1: Open-Source Coding Powerhouse from Zhipu AI

The quieter but arguably more important release is GLM 5.1 from Zhipu AI (often branded Z AI). This is an open-weight model under the MIT license, and the raw numbers are eye-opening.

On SWE-bench Pro, a key software engineering benchmark, GLM 5.1 scores around 58.4 – slightly ahead of GPT 5.4, Claude Opus 4.6, and Gemini 3.1. In other words, you get near state-of-the-art coding performance in a model you can download, run locally, and fine-tune yourself.

On real-world terminal tasks and agentic coding, GPT 5.4 still leads, but GLM 5.1 comes in second. For math and general reasoning, it trails the very top proprietary models slightly but remains highly competitive across the board.

The big story: open models are catching up fast. With GLM 5.1, teams can build powerful local coding assistants and agents without depending entirely on closed APIs. For developers and companies concerned about data control, cost, or vendor lock-in, this is a major milestone.

New Gemini Features: Interactive Visualizations and Notebooks

Google did not launch a new core model this week, but Gemini got two meaningful upgrades that make it more useful for day-to-day work.

Interactive Simulations and Visualizations

Gemini can now generate interactive visualizations directly in the chat interface, similar to recent features from OpenAI and Anthropic. You can ask it to:

• Visualize physics concepts like the three-body problem (with sliders for masses, positions, and velocities).
• Explore financial concepts like compound interest over time, adjusting principal, rate, and duration in real time.

The three-body example is still a bit hit-or-miss, but for simpler scenarios like compound interest, the interactive charts work well. You can tweak parameters and instantly see how the graph and numbers change.

To try it, open Gemini (using the Pro model) and ask something like “Help me visualize compound interest over multiple time frames” and then click “Show me the visualization.”

Gemini Notebooks (Linked with Notebook LM)

Google also introduced Notebooks inside Gemini – essentially their version of “projects” from other AI platforms. A Notebook is a dedicated workspace where you can:

• Keep all conversations about a single topic in one place.
• Set custom instructions that apply only inside that notebook.
• Enable notebook-specific memory so Gemini remembers prior chats within that topic.
• Attach files (PDFs, Docs, Drive files, URLs, pasted text) as sources.

Crucially, you can send a Gemini notebook into Notebook LM to unlock features like AI-generated podcasts, mind maps, and deeper research workflows. This is especially useful for long-running projects, journaling, or focused research where you do not want context bleeding in from unrelated chats.

Notebooks are rolling out first to paid Gemini tiers (Ultra, Pro, Plus), with broader availability likely later.

Video Generation Heats Up: SeaDance 2.0 and a Mysterious “Happy Horse”

SeaDance 2.0 Finally Reaches the US

SeaDance, a video model that has been heavily hyped for months, is now accessible in the US through Runway and ByteDance’s CapCut app. Some of the viral features (like generating trademarked characters or realistic celebrities) have been deliberately restricted, but the core model is still extremely strong.

Inside Runway, you can select “SeaDance 2.0” as the generation model and feed it detailed multi-scene prompts. It produces high-quality, coherent clips and does so surprisingly fast – noticeably quicker than Kling 3.0 in many cases.

Right now, SeaDance 2.0 is arguably the best widely available video model, especially after OpenAI’s Sora access was pulled back.

Happy Horse 1.0: A New Leaderboard Champion

A mysterious model called Happy Horse 1.0 suddenly appeared at the top of a major video model leaderboard, surpassing SeaDance 2.0, Kling, and others. Sample clips look like high-end stock footage: smooth motion, realistic lighting, and convincing detail.

Reports suggest Happy Horse 1.0 may come from Alibaba, though that has not been fully confirmed. If true, it signals yet another major player pushing hard into frontier video generation. For now, public access details are limited, but it is one to watch.

If you are interested in using current video models creatively, you might also enjoy this guide on how to bulk-create Ghibli-style 90s nostalgia videos with free AI tools.

HeyGen Avatar 5: 15-Second Identity Capture

HeyGen launched Avatar 5, a new avatar model that can clone your on-camera presence from just 15 seconds of video. After a short recording, the system generates multiple avatar variants (different outfits, backgrounds, and framing) that you can drop into scripts.

The results are not perfect yet – voice and lip sync can still feel slightly off, and background removal may struggle with things like headphones – but the progress is significant. For creators and businesses, this makes it easier to generate talking-head style content without constantly filming yourself.

HeyGen’s AI Studio editor also lets you tweak backgrounds and layouts, making it a practical tool for quick explainer videos, onboarding content, or localized versions of the same message.

Rapid-Fire AI Updates You Should Know About

OpenAI: New $100 Pro Tier and Image Model Rumors

OpenAI introduced a new $100/month Pro tier that sits between the $20 Plus plan and the $200 top-tier subscription. The Pro tier offers:

• Around 5× more Codex (code-focused) usage than Plus.
• Access to the exclusive Pro model and unlimited “instant” and “thinking” models.

For a limited time (through May 31), Pro users get up to 10× the Codex usage of Plus. If you are doing long, intensive coding sessions with ChatGPT, this tier is aimed at you.

Separately, there are signs that GPT Image 2 – OpenAI’s next image model – is being tested in the wild. A set of models on Arena AI (masking tape alpha, gaffer tape alpha, packing tape alpha) are producing very strong infographics, maps, and UI mockups, and many suspect these are early deployments of GPT Image 2.

Anthropic: Managed Agents and a Policy Change

Anthropic rolled out a new “Claude Managed Agents” feature in the Claude console. These are preconfigured, tool-connected agents designed to integrate with apps like Notion, Slack, Intercom, and Asana.

You can:

• Start from templates (e.g., Notion or Asana agents that react to board changes).
• Or describe what you want the agent to do, and Claude will generate the agent configuration.

A common pattern is linking agents to task management tools: moving a card on a Kanban board can trigger a chain of automated actions handled by the agent in the background.

Anthropic also made a controversial change: Claude subscriptions (like the $200/month Claude Max plan) will no longer cover usage on third-party tools such as OpenClaw. You can still use Anthropic via API keys, but subscription credits cannot be spent inside external apps. This likely reflects the high token costs of agentic tools, but it does make some existing setups more expensive.

Perplexity + Plaid: AI-Powered Personal Finance View

Perplexity announced an integration with Plaid that lets you connect your financial accounts (banking, credit cards, loans, investments) and analyze them through an AI interface.

With read-only access via Plaid, you can:

• Track spending by category with detailed transactions.
• Monitor mortgages, auto loans, and student loans.
• Get a consolidated view of your net worth across accounts.

Plaid handles the connection and security; Perplexity never directly stores your financial credentials. The result is an AI-powered personal finance hub that can answer questions and surface insights from your real data.

Factory AI Desktop, Cursor Remote Control, and XAI Image Editing

• Factory AI released a desktop app (previously it was CLI-first). You can now launch and manage agents (“droids”) from a graphical interface on your machine.
• Cursor, the AI coding editor, added the ability to run Cursor on any machine and control it remotely – including from your phone. You can kick off coding agents on your dev box while away from your desk.
• xAI (Elon Musk’s AI company) upgraded its photo features: you can now edit generated images with text prompts, add blur, and redact parts of photos. It is live on iOS with Android support coming.

On-Device Speech and Smarter Spotify Playlists

Google quietly released an iOS app called Google AI Edge, a dictation tool that runs entirely offline using the Gemma model. It works similarly to Whisper-based apps (like WhisperFlow or Willow), but all transcription happens on-device, preserving privacy and working without an internet connection.

Spotify expanded its AI playlists feature from music to podcasts. You can now ask for podcast playlists like “episodes about how AI is changing business” or “deep dives on startup finance,” and Spotify will assemble a tailored list. This could finally make discovering new podcast shows and episodes much easier.

Staying Sane in an Exponential AI World

AI news is now a firehose: new models, tools, and features drop almost daily. It is easy to feel like you are falling behind if you try to track everything in real time.

A more sustainable approach is to focus on weekly snapshots like this one, skim what matters to your work or interests, and ignore the rest. The pace is not going to slow down, but your attention does not have to follow every headline.

This week’s key takeaways:

• Claude Mythos shows how advanced coding models can become powerful cybersecurity tools – and risks – overnight.
• Open-source is catching up fast, with GLM 5.1 delivering near state-of-the-art coding performance you can run locally.
• Video, avatars, and everyday productivity tools are all getting smarter and more accessible, from SeaDance 2.0 to Gemini Notebooks and HeyGen Avatar 5.

Pick one or two of these that actually help you build, learn, or automate something – and let the rest be background noise until it matters for you.