How to build self-improving Claude Code skills using Karpathy’s system

04 Jun 2026 00:37 39,323 views

Most Claude Code skills are just static markdown files that forget everything. This guide shows how to turn them into “super skills” that remember, connect to live data, and improve themselves over time using Karpathy-inspired principles, memory systems, and connectors.

Most people use Claude Code skills as simple, static helpers. They paste in a markdown file, run it a few times, then forget it exists. Meanwhile, the real power of Claude Code comes from skills that remember, learn, and get better every time you use them.

This guide walks you through how to build those “super skills” – reusable, self-improving workflows that plug into your data, keep long-term memory, and refine themselves over time using Andrej Karpathy’s mental models.

Utility skills vs. super skills

Not all skills are created equal. In Claude Code, it helps to think in terms of two broad categories: utility skills and super skills.

Utility skills are simple, single-purpose tools. For example, a Bitly skill that just takes a URL and returns a shortened link. These are easy to set up and hard to mess up. They don’t need memory or complex logic; they just do one thing reliably.

Super skills are different. They are the core “superpowers” you rely on every day in your business – the equivalent of super strength, flight, and laser vision. A super skill might be a content strategy dashboard, a sponsorship reply system, or an insight/outlier analysis tool. These skills:

Listen to you and adapt to your context
Remember what happened in previous sessions
Connect to external tools and data sources
Self-evaluate and improve their own behavior over time

If you’ve been thinking about skills in the context of AI agents, it’s worth first understanding what skills really are and how to design them well. For a broader primer, check out this introduction to AI agent skills and how to build your first one.

Why most Claude skills underperform

Most skills people download or share suffer from the same core problems:

They’re just static markdown files. Someone writes a generic prompt, saves it as a skill, and never updates it. The skill never adapts to your business, your voice, or your strategy.
They lack strategic context. A “YouTube helper” skill that works for one creator won’t work for another without understanding niche, audience, goals, and constraints. Without that, it’s like a consultant giving advice without knowing your business.
They forget everything. You give feedback, tweak outputs, and then next session the skill behaves as if it never met you. There’s no persistent memory, so you’re stuck repeating yourself.
They don’t improve. The skill never scores its own output, never updates its own instructions, and never gets better with use.

Super skills fix all of this by combining better instructions, external tools, and a proper memory system.

Karpathy’s four principles for better skills

A key ingredient in these super skills comes from Andrej Karpathy’s mental model for working with code-generating models. His approach was turned into a small, high-impact Claude skill that became popular on GitHub.

The idea is simple: encode four guiding principles directly into your skill so Claude consistently behaves like a careful, thoughtful engineer instead of a rushed autocomplete. Those principles are:

Think before coding. The model should pause, reason, and clarify assumptions before writing or changing anything. This reduces wrong assumptions, hidden confusion, and missed trade-offs.
Simplicity first. Prefer the simplest solution that works. Avoid overengineering, unnecessary abstractions, and complex structures that make future changes harder.
Surgical changes. Only touch what you must. If you ask to change a button color, the model shouldn’t rewrite the entire page. This keeps edits safe and predictable.
Goal-driven execution. Keep the end goal in mind and verify success with tests or checks where possible.

The beauty of this pattern is that it’s short and reusable. You can embed these principles into any Claude Code skill – not just coding tasks – so the model behaves more deliberately whenever it runs the skill.

Step 1: Generate a proper skill spec (don’t handwrite it)

Most people write skills by hand: they open a markdown file, type some instructions, and hope for the best. Claude Code gives you a better option: use Claude itself to design the skill specification.

Instead of writing the skill directly, start with a meta-prompt like:

“I would like you to create a skill. Use your skill creator skill in Claude. My intention and outcome is: [describe what you want].”

Then describe your desired super skill in detail. For example, imagine building a “signal dashboard” that drives your content strategy. You might say:

You want to separate signal from noise across AI news, launches, and research
You want early awareness of important developments before they go mainstream
You want sentiment analysis and trend detection across multiple sources
You want a daily HTML dashboard that’s easy to scan and act on

Ask Claude to:

Clarify its understanding of your goals and constraints
Propose the tools and data sources it should use
Define the exact output format (for example, a clean HTML dashboard)

Claude’s built-in skill generator will then ask you smart follow-up questions about:

Topical scope (what topics count as “signal” for you?)
What “early” means in your context
Cadence (daily, weekly, on demand)
Signals you care about (launches, funding, research, policy, etc.)
Depth of analysis and sentiment
How memory and action layers should work

By iterating with Claude at this spec level first, you end up with a tailored, high-quality skill definition that’s far more powerful than a generic downloaded file. This is the foundation of a true super skill.

Step 2: Give your skill “eyes” with connectors

A super skill without external data is like a Ferrari running on fumes. To make skills genuinely useful, you need to connect them to the right sources of information.

Claude Code supports connectors – integrations that let skills pull data from services like Gmail, Figma, and more. The basic workflow is:

Click the + icon in Claude Code.
Select Connectors, then Manage connectors.
Use Browse connectors to search for built-in integrations (for example, Gmail, Figma, Spotify).

When designing your skill, ask Claude which sources count as primary data for your use case. For a signal dashboard, that might include:

Official blogs and changelogs from Anthropic, OpenAI, Google, etc.
Product Hunt for new tools
Newsletters, forums, or specific communities

Once you know where the best data lives, you can decide how to connect to it.

Using Firecrawl for web data

Websites are built for humans, not models. If your skill needs to pull structured information from the web, a tool like Firecrawl can help by turning messy pages into clean, token-efficient data.

To add Firecrawl as a custom connector:

In Claude Code, click the + icon and choose Add custom connector.
Name it (for example, firecrawl).
Paste the Firecrawl MCP server URL and include your API key from your Firecrawl dashboard.
Save and authenticate.

Now your skill can do things like:

“Use my Firecrawl connector to scan Product Hunt and find three interesting speech-to-text AI tools. Return a quick HTML summary.”

Claude will call Firecrawl, extract structured data, and then render it in the format you requested.

Using Zapier as a universal bridge

What if Claude doesn’t have a native connector for the tool you need? That’s where Zapier comes in. You can treat Zapier as a universal remote for the rest of your stack.

The flow looks like this:

In Zapier, create a Zap that connects to your target app (for example, Skool, a CRM, or another SaaS tool).
Generate a custom webhook or MCP-compatible endpoint.
Copy that URL.
In Claude Code, add a new custom connector (for example, named zapier) and paste the URL as the MCP server URL.

Now your super skill can reach into almost any tool in your stack, even if Claude doesn’t support it natively.

Step 3: Build a real memory operating system

One of the biggest limitations of LLM-based workflows is memory. Models are great at pattern matching, but they forget past sessions unless you explicitly store and retrieve context.

A proper memory operating system solves this by combining three kinds of memory:

Bucket 1: Conversation memory

This is a log of important conversations you’ve had with Claude – strategy sessions, planning discussions, detailed breakdowns, and so on.

With the right skill, you can use a simple command (for example, /wrap-up) to:

Summarize the current session
Store that summary in long-term memory
Index it so it can be recalled later

Over time, this becomes an infinite archive of your best thinking with Claude, selectively saved instead of everything being dumped into a giant log.

Bucket 2: Knowledge base (long-term memory)

This is your foundational, relatively immutable knowledge: books, courses, your own content, expert frameworks, and reference material.

Examples include:

Transcripts of your own videos or posts
Books or playbooks from experts you follow
Documentation and SOPs for your business

This data usually lives in a vector database like Pinecone or in a structured note system. When your skills run, they can query this knowledge base to answer questions in your style and with your preferred strategies.

Bucket 3: Current profile and strategy

This is the part most people miss. Your current strategy, focus, and priorities change over time – so they shouldn’t be buried in a static, long-term index.

Instead, you keep this as a single, mutable markdown file that Claude reads every session. It might include:

Your current business focus
Key projects in progress
Audience segments you care about right now
Constraints and non-negotiables

When combined, these three buckets let your super skills:

Remember what you’ve discussed before (Bucket 1)
Leverage deep, expert-level knowledge (Bucket 2)
Stay aligned with what you’re working on today (Bucket 3)

Visualizing your memory OS

A good memory system doesn’t just store data – it also gives you a way to see and manage it. A memory OS skill can generate a dashboard that shows:

How many sessions you’ve wrapped up
What topics you’re focusing on
Customer insights and patterns
Activity heat maps (how often you’re adding to memory)

Commands like /strategy-awareness can render this as an HTML dashboard so you can quickly inspect what Claude “knows” about you and adjust it as needed.

Under the hood, you can implement this with Obsidian-based RAG or with a vector database like Pinecone. Obsidian works well for smaller, shorter files, but as your knowledge base grows, Pinecone tends to scale better and burn fewer tokens per query. Both approaches are valid; the important part is that your skills have a structured way to store and recall information.

Step 4: Add a refinement loop so skills improve themselves

Even with great instructions, connectors, and memory, a skill that never improves is leaving a lot of value on the table. The final pillar of a super skill is a built-in refinement loop.

The idea is simple:

You run the skill and get an output (for example, a daily signal dashboard).
You evaluate it: what worked, what didn’t, what felt off?
You give explicit feedback to the skill (too many items, wrong priorities, missing sources, etc.).
The model uses that feedback to update the skill’s own markdown file – its instructions, heuristics, or scoring rules.

Over time, the skill learns your preferences and becomes sharper:

It can adjust how it ranks or filters information
It can change how much detail to show
It can refine which sources it trusts most

This is where the real compounding value appears. Instead of starting from scratch every session, your skills become more like experienced team members who have flown this route hundreds of times before.

Putting it all together: your own super skill stack

When you combine all four pillars, you get a very different experience from a basic Claude skill:

Tier 1 – Proper skill design: Use Claude’s skill creator to define clear goals, tools, and outputs instead of hand-writing vague prompts.
Tier 2 – Connectors and data: Give your skills “eyes” with built-in connectors, Firecrawl for web data, and Zapier for everything else.
Tier 3 – Memory OS: Store conversations, long-term knowledge, and current strategy in a structured way so skills can recall and adapt.
Tier 4 – Refinement loop: Let skills grade their own outputs and update themselves based on your feedback.

Once you think in terms of reusable, evolving skills instead of one-off agents, you can build a real AI-powered workflow around your business. If you want to go deeper into this mindset, it pairs nicely with the idea of focusing on reusable skills instead of monolithic agents, as explored in this guide on building reusable AI skills.

The end result: Claude Code stops being a collection of random prompts and becomes a growing system of super skills that remember, learn, and compound value every time you use them.