10 practical ways to save tokens with GitHub Copilot’s new billing

16 Jun 2026 01:07 8,293 views

GitHub Copilot has moved to usage-based billing, which means every token now costs real money. Here are 10 practical, developer-friendly tactics to cut your Copilot token usage without losing productivity.

GitHub Copilot has switched to usage-based billing, which means every token you send or receive now shows up on your bill. If you keep using it the old way, your costs can spike fast—especially across a whole team. The good news: a few smart configuration tweaks and small daily habits can dramatically cut your token usage without sacrificing code quality.

1. Turn on “code only” responses

Under the new billing model, output tokens are far more expensive than input tokens. That means long explanations from Copilot cost you a lot more than short, focused answers.

You can fix this with a single setting. In your copilot.instructions.md file, add an instruction like:

code only, no explanation

This tells Copilot to return just the code, not paragraphs of commentary. Developers commonly see a 40–70% reduction in output size for coding tasks with this one change. Set it once and it applies to every request going forward.

2. Use terse, directive prompts

Most developers prompt like they’re writing an email: polite, wordy, full sentences, and lots of extra context. The model doesn’t need that ceremony. Every extra word is an input token you pay for.

Instead, switch to short, directive prompts—sometimes called “caveman speak.” For example:

Instead of: “Could you please refactor this function to make it more readable and add some basic error handling?”

Use: “Refactor for readability + add basic error handling.”

This style can cut your input tokens by 30–50% while giving the model everything it needs to do the job.

3. Slim down your copilot.instructions.md

Your copilot.instructions.md file isn’t free. It’s injected into every request Copilot sends. If you have 60 lines of instructions and 10 developers making 100 requests a day, you’re paying for those 60 lines 1,000 times daily.

Over time, these files often bloat with LLM-generated boilerplate and redundant rules. To cut costs:

Keep only the essentials: tech stack, coding style rules, and output format.
Aim for under 20 lines total.
Delete generic, fluffy guidance like “be helpful” or “act like a senior engineer.”

Trimming this file is one of the highest-leverage ways to reduce tokens across an entire team.

4. Scope instructions with apply-to rules

By default, your instructions apply everywhere. That means your Playwright test rules load when you’re editing SQL, and your Python rules load when you’re working in TypeScript. All of that becomes extra system prompt tokens on every request.

You can avoid this by splitting instructions into multiple files and using apply to front matter so each file only loads when relevant. For example, one instructions file for tests, one for backend, one for frontend.

In mixed-language or large monorepos, this can significantly shrink the system prompt for each request and keep Copilot focused on the right conventions.

5. Close unused editor tabs

Copilot automatically includes content from nearby open files as context. That’s powerful, but it’s also a hidden token sink.

If you have 15 tabs open—config files, logs, schemas, half-finished scripts—Copilot may send large chunks of those files with every prompt, even if they’re irrelevant to your current task.

Adopt a simple rule:

Keep a maximum of ~5 tabs open.
Only keep files open that are directly related to what you’re doing right now.

This small habit can quietly save thousands of tokens per day across a team.

6. Route work to the right model tier

Not all models cost the same. Frontier models like Claude Opus or the latest GPT versions are much more expensive per token than lighter models. Using them for trivial tasks—like renaming a variable or formatting a function—is measurable waste.

Think in tiers:

Light models (e.g., Haiku, GPT-4o-mini): great for simple questions, small refactors, and routine coding.
Heavy models (frontier models): reserve for complex refactors, deep debugging, architecture questions, or tricky reasoning.

By defaulting to the lightest model that reliably solves the task, you keep costs under control while still having the heavy hitters available when you truly need them.

7. Stop pasting entire files into chat

Pasting a whole file into Copilot chat is one of the most expensive habits in day-to-day development. A 500-line file can easily be around 2,000 input tokens—before Copilot even starts answering your question. Most of those lines are irrelevant to the specific issue you’re asking about.

Instead:

Use hash file references if your environment supports them (e.g., #file path/to/file.ts).
Highlight only the relevant block of code and use inline Copilot or chat based on that selection.

You’ll get nearly the same quality of answer with a fraction of the token cost.

8. Prefer ask mode over agent mode for simple questions

Copilot often has two main interaction patterns: a simple “ask mode” and a more advanced “agent mode.” Agent mode can call tools, inspect your workspace, and take multiple steps—but that power comes with a lot of overhead.

Agent mode typically:

Loads tool definitions.
Pulls in broader workspace context.
Replays the full conversation history for every step it takes.

For a quick question like “What does this function do?” or “Why is this test failing?” you might be paying up to 10x more tokens than necessary if you use agent mode.

Simple rule: if your question should fit in a single response, use ask mode.

9. Start fresh conversations for new tasks

Every time you send a new message in a chat thread, Copilot resends the entire conversation history as context. That means:

Message 1 sends 1 unit of context.
Message 10 can send roughly 10 units of context, even if your new question is small.

Long-running chats quietly inflate your token usage. The fix is easy: start a new conversation for each distinct task.

In VS Code, you can use Ctrl+L (or the equivalent shortcut) to open a fresh chat. Think of it as a zero-cost context reset that keeps your prompts lean and focused.

If you’re interested in deeper patterns for managing context and agent behavior efficiently, you might also like this guide on saving tokens with smarter MCP agent patterns.

10. Watch your usage dashboard

You can’t optimize what you don’t measure. Under usage-based billing, GitHub exposes real-time AI credit usage for both individuals and organizations.

From the dashboard you can:

See how quickly you’re burning through your monthly credits.
Spot unusual spikes in usage.
As an org admin, set per-user budgets and even block access once someone hits their limit.

If you’ve never checked this page, it’s worth doing right away—you may find a few power users or workflows that are driving most of the bill. For teams leaning heavily on AI for testing, it can also be useful to pair this with a dedicated strategy for AI-assisted QA, such as the approach described in this full guide to AI-powered test automation.