AI News: Huge Updates From Anthropic, OpenAI, Google, Perplexity and More
It’s been another packed week in AI, with major updates from OpenAI, Anthropic, Google, Perplexity, and more. Coding agents got smarter, desktop apps became more powerful, and we saw fresh progress in speech, images, and even robotics.
Here’s a clear breakdown of what changed, why it matters, and where it might actually help you day to day.
OpenAI Codex Is Turning Into a True AI Super-App
OpenAI’s Codex desktop app took a big step toward becoming a full “AI super-app” rather than just a coding helper. It now blends code generation, UI control, image generation, and persistent workflows into one environment.
New capabilities inside Codex
Codex is no longer just a code editor with AI. It can now:
• Operate your computer in the background – It can see, click, and type with its own cursor, and multiple agents can work on your Mac in parallel without blocking you from using other apps.
• Generate images directly in the app – Using GPT Image 1.5, Codex can create mockups and graphics inside your project, instead of forcing you to jump out to ChatGPT or another tool.
• Remember preferences and past actions – It can learn from how you work and take on ongoing or repeating tasks, edging closer to a true AI agent.
• Use an in-app browser – You can browse, highlight parts of a page, and leave comments that Codex uses as context to update your code or design.
Real examples: from mockup to working app
In practice, Codex can now:
• Take a prompt like “Generate a mockup image of a website for a site that sells surfboards and tacos,” create the image, then build a working static site based on that mockup.
• Let you switch to “comment mode,” click on a section of the page, and write instructions like “Generate an image with surfboards and a taco truck for this background, but don’t cover the text” – and it will generate and apply the new image.
• Build and run a local desktop app (for example, a Connect 4 game for macOS), then take control of the app itself to play as the user and test the experience end-to-end.
The big shift: Codex is starting to look less like a smart IDE and more like an AI agent that can design, build, run, and test software on your machine with minimal hand-holding.
Anthropic’s Claude App and Claude Opus 4.7 Get Major Coding Upgrades
Anthropic also shipped notable updates on two fronts: the Claude desktop app (especially Claude Code) and a new flagship model for coding, Claude Opus 4.7.
Claude Code: smoother multi-project workflows
The Claude app has been redesigned to make its different modes easier to work with: Claude Chat, Claude Code, and Co-Work now sit in a clearer layout. The biggest practical improvement is for developers:
• Parallel sessions – You can kick off multiple Claude Code sessions at once, such as different repos or different features in the same project, and switch between them as results come in.
• Pin and rearrange threads – Pin your most important sessions and drag them around so you can quickly bounce between ongoing tasks.
• Integrated terminal and file editor – You can run commands and edit files directly inside the Claude app, reducing the need to jump out to a separate terminal or editor.
• Faster diffs and better previews – View HTML, PDFs, and even local app servers inside the app, making it easier to see what Claude changed and how your app looks.
The direction is similar to OpenAI’s Codex: keep more of your coding workflow in one AI-powered environment.
Claude Opus 4.7: a new top-tier coding model
Anthropic also released Claude Opus 4.7, a new version of its high-end model with a strong focus on software engineering performance.
On the SWEBench benchmarks (which simulate real-world software engineering tasks), Opus 4.7 lands roughly halfway between the previous public Opus 4.6 and Anthropic’s internal Mythos preview model. The key takeaway:
• Big jump in agentic coding – Opus 4.7 is significantly better at multi-step coding tasks, bug fixing, and working across large codebases.
• Improved instruction following – It’s better at understanding what you want from a single prompt, so you should need less prompt engineering and fewer back-and-forth iterations.
• Better multimodal and memory – It handles images and long-context reasoning more reliably, which helps when you’re feeding it diagrams, screenshots, or long project histories.
If you’re a developer using tools like Cursor, Claude Code, or other Claude-powered IDEs, Opus 4.7 is now one of the strongest options for serious coding work. For more background on Anthropic’s model lineup and safety work, you may also want to read this deeper dive into the Mythos model.
Google: Desktop Gemini Apps, Chrome Skills, and a New TTS Model
Google had a busy week too, with updates across desktop apps, Chrome, and speech and image generation.
Gemini desktop apps for Mac and Windows
Google’s AI experience is moving beyond the browser:
• Gemini desktop app for Windows – The Google app for desktop (with Gemini AI mode) is now available globally on Windows. It’s essentially Google Search plus Gemini in a dedicated app.
• Gemini app for Mac – Mac users now get a full Gemini desktop app as well. Anything you can do in Gemini on the web—chat, image generation with “Nano Banana,” video generation with Veo, music creation, deep research, guided learning—can now be done in the standalone app.
You can download the Mac app at gemini.google/mac, and it syncs all your existing chats and tools from the browser version.
Chrome “skills”: reusable AI slash commands
One of the more practical updates is coming to Chrome: Gemini skills inside the browser.
• When you write a useful prompt in Gemini in Chrome (for example, “Summarize this news article into 5 bullet points with key numbers highlighted”), you’ll be able to save it as a skill.
• Next time you’re on a page, you can trigger that skill with a slash command (like typing /summarize) or via a quick menu, and Gemini will run that prompt against the current page and any other tabs you select.
This is similar to Perplexity’s Comet assistant, where you can define custom commands (e.g., “news review,” “YouTube comments analysis”) that run on the current page. Google is essentially baking that workflow into Chrome itself.
Gemini 3.1 Flash TTS: more expressive text-to-speech
Google also released Gemini 3.1 Flash TTS, a new text-to-speech model available in Vertex AI, Google Vids, and Google AI Studio.
Key features include:
• Fine-grained control over delivery – You can use tags like <excited>, <amazed>, <whispers>, <panic>, <sighs>, and <laughs> to shape the tone and emotion of the voice.
• Multi-speaker conversations – Templates in AI Studio let you script back-and-forth dialogue between two speakers, similar to an auto-generated podcast or NotebookLM-style conversation.
It’s not perfect yet—some tags may be skipped or sound slightly off—but it’s already powerful for creating more natural-sounding narrations, voiceovers, and conversational content. If you’re exploring AI audio tools, this sits alongside other text-to-speech options in the space and is particularly convenient if you’re already using Google’s ecosystem.
Personalized images with Nano Banana
Google’s “Nano Banana” image model (used inside Gemini) now taps into your personal context—things like Gmail, Calendar, Drive, and Google Photos—when you’ve enabled personal intelligence.
That means you can ask it to generate images that better reflect your real life: your dog, your family, your hobbies, or your typical work environment. The quality of personalization will depend heavily on how much of your life is actually in that Google account, but the direction is clear: more tailored, context-aware image generation.
For a broader look at Google’s AI roadmap and upcoming models, you can also check out this guide to what’s expected from future Gemini and Veo releases.
Perplexity Personal Computer: Agents That Live on Your Machine
Perplexity expanded its agentic capabilities with a new feature called Personal Computer.
Previously, Perplexity’s “Computer” ran entirely in the cloud: you’d ask it to build something, and it would orchestrate tools on Perplexity’s servers. Now, Personal Computer brings that orchestration layer down to your own device.
• Runs on your local machine – The AI models still run in Perplexity’s cloud, but the agent can now access your local files, native apps, and connectors directly on your computer.
• Persistent workflows – A great use case is a dedicated machine (like a Mac mini) that stays on 24/7, where Perplexity can continuously work through tasks that require local access.
• End-to-end task handling – You could ask it to process your to-do list: it would read your notes, reason about how to complete each item, and then work across your files, iMessage, email, connected apps, and the web to actually get things done.
• Auditable and reversible actions – You can inspect what it did and undo changes, which is critical when an agent has deep access to your machine.
Conceptually, this is similar to OpenAI’s agentic experiments and Anthropic’s Co-Work, but powered by Perplexity’s own models and UX. It’s another sign that “AI agents that operate your computer” are quickly moving from demo to practical reality.
Canva, Open-Source Models, and Other Notable Releases
Canva AI 2.0: prompt-first design and automation
Canva teased a set of “AI 2.0” features that aim to make design and content creation much more prompt-driven:
• Prompt-to-campaign – You’ll be able to ask Canva to “Design a social media campaign to advertise our summer running shoe sale using the brief in Notion,” and it will pull from that brief and generate full campaign assets.
• Connect to your tools – Integrations with Slack, Notion, Gmail, and more will let Canva pull context directly from your existing workflows.
• Task scheduling – You can schedule AI tasks like “write a guidance document,” “generate a summary presentation,” or “create a summary report” to run automatically.
• Style learning – Canva will learn from your existing designs and adapt to your brand or personal style over time, which is especially useful for recurring content like YouTube thumbnails or social posts.
• Generate any element – Graphics, audio, and even 3D elements can be generated directly inside the editor.
There’s also an offline version of Canva coming and new AI-powered education resources mapped to specific curricula, making it more useful in classrooms.
New open and specialized models
A few notable model releases landed this week as well:
• MiniAX M2.7 – An open model that delivers strong results on the SWEBench Pro coding benchmark (around 56.22%), beating some earlier closed models like Claude Opus 4.6 and Gemini on that metric. However, its license prohibits commercial use, so it’s more for research and experimentation than for building commercial products.
• Qwen 3.6 35B A3B (Alibaba) – A 35B-parameter open-source model with respectable performance on coding and reasoning benchmarks. It’s not at the level of Claude Opus 4.7, but it’s open, can be fine-tuned, and is likely runnable on a strong local GPU or custom cloud setup.
These releases continue the trend of increasingly capable open models, giving developers more options for self-hosted or customized AI stacks.
OpenAI’s Rosalind: AI for life sciences
OpenAI introduced GPT Rosalind, a reasoning model tailored for biology, drug discovery, and translational medicine.
• It’s optimized for workflows across chemistry, protein engineering, genomics, and experimental design.
• Benchmarks show it outperforming OpenAI’s general-purpose models on scientific tasks and beating some competing models like Gemini 3.1 and Grok in the evaluated areas.
• Access is restricted to vetted scientists and researchers (“trusted access only”), reflecting the higher stakes and safety concerns around powerful bio-focused models.
While most people won’t touch Rosalind directly, it’s an important signal: AI is moving deeper into domains where it could materially accelerate scientific discovery, from new drugs to better materials.
Image and video tools: Microsoft, Midjourney, and DaVinci Resolve
Several visual tools also got upgrades:
• Microsoft MAI Image 2 Efficient – A faster, cheaper variant of Microsoft’s image model. It’s optimized for speed (around 13.7 seconds per image vs. ~19 seconds for Google’s Nano Banana Flash) and handles short text like labels and headlines well. For longer or more complex in-image text, Microsoft still recommends the original MAI Image 2.
• Midjourney v8.1 – A new version that brings back more of Midjourney’s “iconic” aesthetic, with native 2K HD rendering. It’s advertised as 3x faster and 3x cheaper than v8, which is impressive given v8 was already quick.
• DaVinci Resolve 21 AI features – Blackmagic’s video editor added AI-powered tools, including:
– AI IntelliSearch to search across your footage for specific objects, people, or spoken phrases, then surface matching clips in the media pool.
– A Face Age Transformer that can adjust an actor’s apparent age by adding or removing age-related features like wrinkles and facial fullness.
For video editors, IntelliSearch in particular could be a major time-saver when wrangling large amounts of B-roll and interview footage.
Weird and Wonderful: Shoe Company Turns AI, and Robots Read To-Do Lists
Allbirds pivots from shoes to AI GPUs
In one of the strangest AI-adjacent stories of the week, shoe company Allbirds announced it is pivoting into AI infrastructure, rebranding as New Bird AI and acquiring high-performance GPU assets.
The market reaction was wild: its stock reportedly jumped around 600% on the news, despite the company’s history of never turning a profit, seeing sales drop nearly 50%, and selling itself for a fraction of its IPO valuation.
For many observers, this looks like a textbook example of AI hype bleeding into public markets—fuel for those who argue we’re in a speculative AI bubble.
Boston Dynamics robot completes a written to-do list
On the more inspiring side, Boston Dynamics shared a demo of a robot that literally reads a to-do list from a whiteboard and then completes the tasks.
The list included items like:
• Put shoes by the front door onto the shoe rack
• Recycle cans in the living room
• Put clothes on the floor into the laundry basket
• Check mouse traps
The robot walks up, reads the whiteboard, interprets each instruction, and then executes the tasks: moving shoes, crushing and recycling cans, collecting clothes, and finally “taking itself for a walk.”
What’s notable isn’t just the physical agility—it’s the pipeline from natural language instructions on a whiteboard to autonomous real-world actions. It’s essentially the same idea as prompting an AI agent in Slack or a desktop app, but extended into the physical world.
Staying Sane in a Firehose of AI News
Weeks like this make it clear how fast the AI landscape is moving: coding agents that run your desktop, specialized scientific models, expressive speech synthesis, and robots that can read your to-do list are all arriving at once.
If you’re trying to stay productive rather than overwhelmed, the most practical approach is to focus on:
• One or two tools that directly improve your daily work (for many, that’s likely Codex, Claude Code, Gemini, or Perplexity).
• Key capabilities that keep showing up across platforms—agents that operate your computer, reusable prompts/skills, and better multimodal understanding.
Everything else is useful context, but you don’t need to chase every single release. The real value comes from picking a few of these tools and actually integrating them into your workflows.
Comments
No comments yet. Be the first to share your thoughts!