Claude Mythos, DeepSeek v4 Rumors, HappyHorse, and Real-Time AI Worlds: This Week in AI

12 May 2026 12:00 111,209 views

Anthropic’s secretive Claude Mythos model sparks an AI cybersecurity arms race, while Alibaba’s HappyHorse tops video leaderboards, Zhipu’s GLM 5.1 becomes the strongest open-source coding model, and new tools push real-time worlds, avatars, music, and more.

AI development is moving so fast that entire categories of tools are changing week to week. This roundup covers the biggest launches and research drops: Anthropic’s secretive Claude Mythos, a new open-source coding beast, China’s latest video models, real-time world generators, ultra-realistic avatars, and more.

Claude Mythos: Anthropic’s “Too Dangerous” Cybersecurity Model

Anthropic has quietly unveiled Claude Mythos Preview, a version of Claude that is so strong at cybersecurity tasks that the company says it will not be released to the general public.

Mythos is designed to find and exploit software vulnerabilities. According to Anthropic’s report, it has already surfaced thousands of high-severity bugs across major operating systems and browsers, including Windows, macOS, iOS, Android, Chrome, Safari, and Firefox.

Examples highlighted include:

• A 27-year-old remote crash vulnerability in OpenBSD, a system known for its security and used in critical infrastructure.
• A 16-year-old bug in FFmpeg, the video processing library used across countless apps and platforms.
• Multiple Linux kernel vulnerabilities that allow escalation from normal user access to full system control.
• Weaknesses in widely used cryptography components like TLS, AES-GCM, and SSH.

Beyond individual bugs, Mythos can chain multiple vulnerabilities into full attack paths—something that typically takes elite security researchers days or weeks.

On agentic coding benchmarks like Swebench Pro, TerminalBench, and Swebench Verified, Mythos shows double-digit percentage improvements over previous Claude versions such as Opus 4.6. Anthropic calls this a “step change,” not a marginal upgrade.

Why Anthropic Won’t Release Mythos

Anthropic explicitly states it will not make Claude Mythos generally available. The concern: putting this level of offensive capability into public hands could increase the frequency and severity of cyberattacks.

Instead, Anthropic launched Project Glasswing, a defensive initiative that gives early access to Mythos to major tech and infrastructure players like Google, Nvidia, Microsoft, Apple, AWS, and leading cybersecurity firms. The idea is to let defenders scan and patch their own systems before similar capabilities become widely accessible.

Anthropic is backing this with up to $1 million in usage credits plus direct funding for open-source security work, framing the situation as the start of an AI-driven cybersecurity arms race.

Hype vs Reality

Not everyone is convinced Mythos is uniquely powerful. Independent researchers have shown that when you isolate the specific vulnerable code snippets Anthropic highlighted, much smaller and cheaper open models (with only a few billion parameters) can also identify the same bugs.

Other top closed models like GPT-5.4 and Claude Opus have also been shown to autonomously find zero-day vulnerabilities in complex codebases like the Linux kernel.

Mythos still has limitations: it struggles with messy, long research tasks, hallucinates facts, and can overcomplicate solutions. It’s best thought of as a very strong but imperfect assistant that still needs human supervision.

If you want a deeper dive into what’s actually known about this model and its safety implications, see this breakdown of Claude Mythos.

GLM 5.1: The New Open-Source Coding Powerhouse

Zhipu AI (often branded as ZAI) has open-sourced GLM 5.1, which is now one of the strongest open models available—especially for coding and agentic workflows.

On agentic coding benchmarks like Swebench Pro, TerminalBench, and NL2Repo, GLM 5.1 matches or even surpasses leading closed models, including GPT-5.4 and Claude Opus 4.6. These benchmarks test whether a model can act like an autonomous coding agent, handling long, multi-step tasks end to end rather than just answering short chat questions.

One demo shows GLM 5.1 autonomously building a full Linux desktop environment from scratch, with over 50 working applications (browser, audio player, chat app, and more). It iteratively refined features, styling, and interactions over eight hours using self-review loops.

You can access GLM 5.1 via API or run it locally. The full weights are huge (around 1.5 TB), so you’ll likely want to wait for quantized versions unless you have serious hardware. The project provides detailed setup instructions on GitHub for those who want to experiment locally.

Real-Time 3D Worlds and Interactive Video

InSpatial World: Turn Any Video into an Explorable Scene

InSpatial World is a model that converts a regular 2D video into a 3D world you can move around in. Instead of locking you into the original camera angle, it builds a persistent internal representation of the scene so you can change viewpoints, rewind, or slow down while maintaining consistent geometry and motion.

Unlike typical video models that predict frames pixel by pixel (and often glitch on fast motion), this approach simulates a world behind the video. That makes it especially promising for training autonomous driving systems, robots, and interactive media.

Performance-wise, it can run at 24 FPS on high-end server GPUs and about 10 FPS on a single RTX 4090—impressive for real-time interactive 3D reconstruction. The code is already available, with instructions for running it locally.

Waypoint 1.5: Real-Time AI Game Worlds on Consumer GPUs

Waypoint 1.5 by Overworld is a real-time world generator designed to run on consumer hardware. It can generate interactive environments at up to 720p and 60 FPS on higher-end systems, and 360p on weaker GPUs.

The visual fidelity isn’t on par with offline cinematic models, but the key breakthrough is speed and interactivity on an RTX 3070–class GPU. You can walk around and interact with the generated world in real time, making it a glimpse of AI-native games and simulations that don’t require massive cloud compute. The project is open source and downloadable today.

China’s Video Push: HappyHorse and More

HappyHorse 1.0: Alibaba’s New Video Model

HappyHorse 1.0 is a new text-to-video model from an Alibaba innovation unit (separate from Tongyi and other internal labs). It recently appeared on the Artificial Analysis leaderboard and appears to outperform most competitors on many metrics.

While it’s not yet clear if it beats ByteDance’s Seedance 2.0 across the board, it reinforces a clear trend: Chinese labs are rapidly dominating high-end AI video generation. HappyHorse is not publicly available yet—any current “HappyHorse” sites or repos are likely fake or scams—though rumors suggest a release around the end of April.

Numina: Fixing Object Counts in Video Generation

Numina is a model-agnostic framework that helps video generators follow object-count instructions more accurately. For prompts like “two kids playing with a dog and a cat” or “four children making two snowmen,” many top models (including OpenAI’s and xAI’s) still produce the wrong number of people or objects.

Numina can be plugged into open-source models such as 1.2.1 and 1.2.2 to correct these issues, producing videos that better match the requested counts. The code is open and comes with instructions for local use.

mmPhys Video: More Realistic Physics in Generated Clips

mmPhys Video is a research framework aimed at making AI-generated video more physically consistent. Instead of learning only from raw pixels, it gives the model multiple aligned representations of a scene—appearance, geometry, motion, and more—and trains them jointly.

Compared to models like 1.2.1, mmPhys Video produces motion that respects object structure and basic physics more reliably. Code and datasets are listed as “coming soon,” with plans to open source.

Meta’s Muse Spark: A New Model, But Not a New Leader

After months of internal restructuring and heavy hiring, Meta has released Muse Spark, the model that will power Meta.ai across Facebook, Instagram, WhatsApp, and its smart glasses.

Muse Spark is multimodal (it can handle images and figures) and performs well on a few specific benchmarks, including chart reasoning (understanding scientific figures), some math tasks, open-ended health questions, and certain agentic research evaluations.

However, on standard benchmarks and independent leaderboards, Muse Spark generally lags behind models like Gemini 3.1 Pro and GPT-5.4 on average. On the Artificial Analysis leaderboard, it sits around fourth place, roughly tied with Claude Sonnet 4.6 and only slightly ahead of the open-source GLM 5.1.

Unlike previous Llama releases, Muse Spark is closed source. Given that it’s not clearly state-of-the-art, its main appeal will likely be tight integration into Meta’s apps rather than raw performance. For a broader look at how Claude Mythos and other frontier models compare, see this recent weekly AI roundup.

Compression Breakthrough: RotorQuant Beats Google’s TurboQuant

Google recently introduced TurboQuant, a memory compression method for large language models that shrinks the key–value (KV) cache. This cache stores internal representations for every token and often becomes the main memory bottleneck for long contexts.

RotorQuant is a new open-source compression method that improves on TurboQuant across speed, quality, and parameter efficiency. The core idea is similar—rotate KV vectors into a more compressible space, then quantize—but RotorQuant uses many tiny, cheap rotations (via Clifford rotators) instead of one huge, expensive matrix multiply.

That design cuts operations per vector from around 16,000 to roughly 200, enabling:

• Over 10× memory compression for the KV cache
• Up to 5.3× faster prefill and around 28% faster decoding
• Comparable or better accuracy with far fewer parameters than TurboQuant

For anyone running large models locally or optimizing inference costs, RotorQuant is a major development.

Ultra-Realistic Avatars and Motion: LPM 1.0, Komodo, and More

LPM 1.0: Real-Time Conversational Avatars

LPM 1.0 is a system for generating highly realistic talking avatars in real time. Given an image of a character and an audio track (plus optional context), it produces video with synchronized lip movements, facial expressions, eye motion, gestures, and natural body language.

Key capabilities include:

• Real-time interaction, including high-fives and other gestures
• Multi-language support and even singing
• Long-form consistency—avatars can stay stable and on-model for 30–45 minutes or more
• Natural idle behavior when “listening,” avoiding the frozen look common in other systems

Right now, LPM 1.0 is described in a technical report with no public code or API, but it’s one of the clearest previews of what real-time, interactive AI characters will look like.

Nvidia Komodo: Text-to-Motion for Humans and Robots

Nvidia’s Komodo (Kinematic Motion Diffusion) generates 3D motion sequences for humans and robots from text prompts. It can create animations like “a person runs forward, then leaps over an obstacle” or “a robot sweeps the floor,” complete with realistic balance, weight shifts, and object interactions.

Komodo also supports fine-grained control: you can manipulate body joints at specific frames to shape the motion precisely. These synthetic motion clips can be used to train robots in simulation via Nvidia Isaac Sim before deploying them in the real world.

The models and code are open source, with detailed instructions for running them locally.

New Tools for Images, Fashion, Floorplans, and Music

Anima v3 Preview: Fast Anime Image Generation

Anima v3 Preview is a small (2B parameter) model specialized for anime and non-photorealistic art. It’s trained on millions of anime images plus hundreds of thousands of other art styles, and it understands Danbooru-style tags like “1girl” as well as quality tags such as “masterpiece” or “best quality.”

Because it’s lightweight and optimized for this niche, it runs quickly on consumer GPUs and integrates smoothly with tools like ComfyUI. It’s already available on platforms like CivitAI and ModelScope.

SpatialEdit: Precise Object and Camera Control in Images

SpatialEdit is an image-editing framework that lets you precisely move objects and adjust camera parameters (yaw, pitch, zoom) in a scene. You can reposition a subject (like a dog) and change the viewpoint while keeping the scene coherent.

On benchmarks, SpatialEdit outperforms other approaches such as NanoBanana and GradEdit, and is often the only model that successfully completes complex object-and-camera placement tasks. The project is fully open source, with released weights (around 32 GB) and training data.

Vanast: Virtual Try-On with Animation

Vanast is a virtual try-on system that can animate a person wearing new clothing. You provide:

• An image of the person
• An image of the target clothing item
• A pose skeleton or pose video

The model outputs an animation of that person wearing the new outfit, preserving face, body shape, and clothing details. It works for upper-body garments, lower-body clothing, and full dresses. Code is listed as “coming soon,” with plans to open source.

Unified Vector Floorplan Generation: Text to Layout

This research project introduces a way to generate building floorplans from text descriptions using a custom markup language called FML (Floorplan Markup Language). FML encodes rooms, walls, doors, and positions as structured tags—like HTML for houses.

Once floorplans are represented as sequences of tokens, they become a language modeling problem. The AI can then generate valid, variable-sized layouts directly from descriptions. For now, only the paper is available; code and tools have not been released.

AEP 1.5 XL: Open-Source Music Generation

AEP 1.5 XL is a new open-source music generator that can create full songs from a text description plus optional lyrics. Benchmarks suggest it rivals or beats some closed models like Udio and older Suno versions; in practice, it feels roughly on par with Suno v4 and approaching v5 quality.

The model runs on consumer hardware, is relatively fast (full songs in under a minute), and is straightforward to install with existing community tutorials.

DeepSeek Expert Mode and AI Agent Platforms

DeepSeek “Expert Mode” (Likely v4 Lite)

DeepSeek’s web platform now includes an “Expert Mode” that appears to be a stronger variant of its v3 model—possibly an early preview of DeepSeek v4 Lite. Users report noticeably better performance on logic puzzles, advanced math, coding, and multi-step reasoning tasks.

It’s currently free to use and seems to hallucinate less while giving more detailed, structured answers. DeepSeek has been rumored to be preparing a full v4 release, so this may be a public testbed for that next generation.

Skywork & SkyClaw: Cloud-Native AI Agents

Skywork is a cloud platform for running long-lived AI agents, with SkyClaw as its flagship autonomous agent. Instead of configuring local hardware and APIs, you get a dedicated virtual machine in the cloud that the agent can use to execute tasks 24/7.

You can control SkyClaw via Telegram, WhatsApp, or Discord, and use built-in “skills” for generating presentations, documents, images, spreadsheets, and even music. The Ultra membership bundles access to premium models like Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro, offering more token value than the subscription cost.