Can Claude Opus 4.6 really do PhD‑level maths?

04 Jun 2026 15:10 21,698 views

A deep dive into how Anthropic’s Claude Opus 4.6 performs on PhD‑level maths, coding, and finance workflows—compared with ChatGPT Plus and Google Gemini. The verdict: highly capable, but held back by strict usage limits.

Can a $20/month AI model genuinely help with PhD‑level maths and serious research? Claude Opus 4.6 from Anthropic aims to do exactly that, and it’s going head‑to‑head with heavyweights like ChatGPT Plus and Google’s Gemini.

How strong is Claude Opus 4.6 at advanced maths?

For theoretical and higher‑level mathematics, Claude Opus 4.6 is surprisingly capable. It doesn’t just spit out final answers; it can walk through algebraic manipulations and multi‑step reasoning in a way that feels close to how a human mathematician would work through a problem.

On extended reasoning, its performance is roughly on par with OpenAI’s mid‑tier “thinking mode” models (like GPT‑5.2 thinking mode in this comparison). It can:

• Reproduce non‑trivial algebraic derivations that a human previously worked out by hand
• Handle long symbolic computations with reasonable accuracy
• Explain intermediate steps instead of skipping straight to the result

In at least one case, it independently reproduced a very specific computation from a master’s‑level proof. That kind of niche, technical derivation is unlikely to be in its training data, which suggests genuine reasoning rather than memorization.

Where it falls short for pure maths

While Claude Opus 4.6 is strong, its absolute peak performance on maths seems slightly behind the best extended‑thinking setups from OpenAI and Google. Compared with something like GPT‑5.2 on its highest reasoning setting, the final insights and conclusions from Claude can feel a bit less sharp or less creative, even when the algebra is correct.

So if you’re chasing the single strongest model for cutting‑edge theoretical maths, Opus 4.6 is competitive but not obviously the best. That said, it’s still very capable and more than enough for most research‑level workflows.

The biggest downside: strict usage limits

The real problem with Claude Opus 4.6 isn’t its intelligence—it’s the limits.

In practice, it’s easy to burn through your daily quota in about 40 minutes of serious work, especially if you’re:

• Asking long, detailed questions (e.g., full LaTeX documents)
• Using extended reasoning on every query
• Running multiple conversations in parallel

Anthropic appears to split your allowance across the day, refreshing partway through (for example, more credits arriving around the evening). But for someone doing intensive research, hitting the cap so quickly is a major workflow killer.

By contrast, users report almost never hitting meaningful limits on ChatGPT Plus, even with multiple extended‑thinking chats open and large documents attached. Gemini Pro also tends to feel less restrictive in day‑to‑day use.

Why Claude often feels more “trustworthy”

One interesting strength of Claude Opus 4.6 is how it uses Python under the hood. When it decides to test an idea with code, it tends to:

• Write more thorough Python scripts
• Test them on a wider range of data
• Use those results to refine its final answer

This makes its responses feel more grounded and empirically checked, especially for numerical or simulation‑based questions. Other models also generate code, but often don’t run or test it as comprehensively.

There’s speculation that Anthropic’s infrastructure (backed by Amazon and AWS) may make it easier to allocate compute for this kind of heavy tool use, but whatever the reason, the end result is answers that often feel more rigorously validated.

Claude Code: a standout for Python workflows

Beyond the main chat model, Claude Code is a major highlight. Integrated into tools like VS Code, it shines for people who:

• Use Python to test mathematical ideas
• Run simulations to check whether an equation or identity is likely to be true
• Want quick, clear explanations of what a script is doing

Compared with other coding assistants (like GitHub Copilot/Codex), Claude Code feels more intuitive and more “human” in how it explains concepts. For non‑professional coders using Python as a tool for research, it can be a huge productivity boost.

The catch: using Claude Code eats into the same token budget. If you lean heavily on it for coding, you’ll hit your Opus 4.6 limits even faster, which again makes the subscription feel more constrained than rivals.

If you’re interested in how Claude Opus 4.6 performs in broader coding and UI design tasks, it’s worth comparing it with OpenAI’s latest models in something like ChatGPT 5.4 vs Claude Opus 4.6 for real-world coding and UI design.

How it compares to ChatGPT Plus

When you put Claude Opus 4.6 up against ChatGPT Plus (with strong reasoning modes enabled), the trade‑off looks like this:

Claude Opus 4.6 pros:

• Very strong at step‑by‑step maths and symbolic reasoning
• Thorough Python‑backed checking of ideas
• Excellent coding assistant via Claude Code

Claude Opus 4.6 cons:

• Strict daily limits that can block serious research workflows
• Overall maths “insight” slightly behind the very best extended‑thinking setups from OpenAI
• Token usage from Claude Code quickly eats into your allowance

For someone doing intensive PhD‑level maths, the limits alone can make ChatGPT Plus feel like the more practical choice, even if Claude is roughly comparable in raw capability. Being able to run multiple long, parallel conversations all day without hitting a wall is a big deal.

This lines up with broader impressions that OpenAI’s top‑end models are currently leading on raw capability, as explored in OpenAI’s GPT‑5.4 Pro might now be the smartest AI model in the world.

What about Google Gemini?

Gemini’s picture is mixed:

• Gemini Pro ($20 tier): Quite impressive for coding and general workflows, but not as strong on deep theoretical maths as the best extended‑thinking modes from OpenAI.
• Gemini Ultra / Deep Think: Expensive and often frustrating. In some cases, it simply refused to answer questions or reported that it was too busy, making the subscription feel like poor value.

Overall, Claude Opus 4.6 comes out ahead of Gemini for serious maths and research, but still behind OpenAI’s best in terms of a balance of power and usability.

Using Claude for finance and investment thinking

Anthropic has leaned more into positioning Claude for domains like finance than for pure maths, and it does perform well there. A particularly useful pattern is using Claude (or any strong model) to stress‑test your investment ideas rather than to generate stock picks.

A practical workflow looks like this:

1. Write down your investment thesis and key assumptions about a company or asset.
2. Ask Claude to attack your thesis—“Here are my beliefs; explain why this might be a bad idea.”
3. Read through the counter‑arguments and see which ones genuinely weaken your conviction.
4. Refine your thesis or risk management based on those insights.

This kind of back‑and‑forth can help reduce emotional reactions to short‑term volatility and build more robust conviction. The key is that you remain the decision‑maker; the model is there to challenge you, not to replace your judgment.

Paid models vs free models: the gap is huge

One of the strongest takeaways is just how much better paid models are than their free counterparts. For many people, their entire opinion of AI is based on using free ChatGPT or free Gemini, which can be misleading.

In one simple test, an expert in English literature asked a high‑level essay question about a book. Comparing the free and paid versions of ChatGPT, the free model’s answer sounded like someone using the right buzzwords without real understanding, while the paid model sounded like a genuine expert.

The same pattern shows up in maths, coding, and finance: the paid models are dramatically more capable, consistent, and useful for real work.

Is Claude Opus 4.6 worth $20 a month?

For many people, yes—with caveats.

It’s worth it if:

• You do serious maths, coding, or research and can work within the limits
• You want a very strong coding assistant (Claude Code) and are okay with token caps
• You value thorough, Python‑backed reasoning

It’s frustrating if:

• You need to work for hours at a time on long, complex problems
• You rely on multiple parallel chats and large documents
• You’re used to the “effectively unlimited” feel of ChatGPT Plus

In terms of raw capability, Claude Opus 4.6 is excellent. But the strict usage limits make it less useful day‑to‑day than ChatGPT Plus for heavy research users, even if you slightly prefer Claude’s style or explanations.

Final thoughts

Claude Opus 4.6 proves that a $20/month AI can absolutely operate at a PhD‑level in maths, assist with real research, and act as a powerful coding and finance companion. The main thing holding it back isn’t its intelligence—it’s how often you’re told you’ve hit your limit.

If Anthropic relaxed those limits, Claude Opus 4.6 would be a near‑perfect rival to ChatGPT Plus for advanced users. As it stands, many power users will still lean toward OpenAI for the simple reason that they can use it all day without worrying about running out of tokens.

Either way, the bigger picture is clear: the top paid models are vastly more capable than the free ones, and for anyone who values their time, $20 a month for this level of help in maths, coding, learning, or even salad‑dressing recipes can be an easy decision.