How to Reduce Claude Code Token Usage

Updated March 2026

Every interaction with Claude Code consumes tokens. Tokens are the fundamental unit Claude uses to process text — roughly one token per 3-4 characters of English text. Every prompt you send, every file Claude reads, every tool call, and every response Claude generates counts against both your rate limits and your session's context window. Using fewer tokens means you can do more within your limits, maintain higher response quality, and avoid hitting rate limit walls mid-task.

Why tokens matter

Tokens affect your Claude Code experience in two distinct ways:

Rate limits — Anthropic tracks your total token consumption across rolling 5-hour and 7-day windows. Once you exhaust your quota, you're throttled until the window resets. Every wasted token brings that wall closer. See How Claude Code Rate Limits Work for the full breakdown.
Context window — Each session has a finite amount of context. As the window fills, each turn carries more history and more token cost. Anthropic documents auto-compaction once context exceeds 95% capacity. See Understanding Claude Code's Context Window for details.

Reducing token usage addresses both problems at once. You stretch your rate limit budget further and keep each session's context cleaner.

Tokens and cost

On the Pro plan ($20/month), you get a fixed token budget — there's no per-token billing, but hitting the limit means waiting for it to reset. On Max plans (5x at $100/month, 20x at $200/month), the budget is larger but still finite. Every token you save is time you get back. Efficient usage doesn't lower a bill — it prevents downtime.

Be specific with file references

The single biggest source of unnecessary token usage is letting Claude explore your codebase on its own. When you say something like "find the authentication logic and fix the bug," Claude will search directories, read multiple files, scan through irrelevant code, and burn thousands of tokens before it even starts on the actual fix.

Instead, point Claude directly at the files that matter:

Bad: "Look at the auth code and fix the login timeout issue"
Good: "Fix the login timeout issue in src/auth/login-handler.ts — the timeout on line 47 should be 30000ms instead of 5000ms"

The more precise your references, the fewer exploratory reads Claude needs. If you already know which files are involved, say so. If you know the approximate line numbers, include them. Every file Claude doesn't have to read is hundreds or thousands of tokens saved.

Name files by path in your prompt

When you mention a file by its path in your prompt, Claude reads it directly instead of searching for it. This is far more efficient than describing what you're looking for and letting Claude explore:

Claude reads only the files you specify — no accidental reads of nearby files
You skip the search step entirely, saving tool call overhead
Claude can begin working immediately instead of spending turns navigating the filesystem

Name files liberally when you know which ones are relevant. Combine them: "refactor the User type in src/models/user.ts to include an email field, and update the handler in src/api/users.ts" gives Claude exactly what it needs with zero exploration overhead.

Keep prompts focused and concise

Your prompts are usually the smallest part of token consumption, but they set the tone for everything that follows. A sprawling, unfocused prompt leads to sprawling, unfocused work — which means more file reads, more tool calls, and more tokens.

State the task clearly in one or two sentences

Lead with what you want done. Follow with constraints or context only if Claude wouldn't know them from the code itself. Skip background explanations Claude doesn't need.

Verbose: "I've been working on this project for a while and we have an authentication system that uses JWT tokens. The tokens are generated in the auth service. I noticed that sometimes when users log in, the token expiration is set wrong. Can you take a look at the auth code and figure out what's going on with the token expiration and fix it?"
Concise: "In src/auth/token.ts, the JWT expiration is set to 1 hour on line 23. Change it to 24 hours."

The concise version saves tokens on both your prompt and Claude's response, and avoids unnecessary exploration.

One task per prompt when possible

Multi-part prompts ("do X, then Y, then Z") force Claude to hold all three tasks in context simultaneously. If the tasks are independent, send them as separate prompts — or better yet, separate sessions. Each focused interaction uses context more efficiently than one sprawling multi-task session.

Start fresh sessions for new tasks

This is one of the highest-impact habits you can build. When you finish a task and move to something unrelated, start a new session. Don't reuse the old one.

A session that has been working on a database migration carries thousands of tokens of migration-related file reads, schema definitions, and error outputs. If you pivot to "now fix the CSS on the settings page," all that database context is pure noise. Claude has to process it on every turn, wasting both your rate limit tokens and Claude's attention.

Fresh sessions cost nothing. They start with an empty context window and give Claude full focus on the new task. Resist the urge to keep one long-running session for an entire day of work.

Use /compact before auto-compaction triggers

Anthropic documents auto-compaction once context exceeds 95% capacity. If you must continue a session, run /compact manually at natural breakpoints — after completing a subtask, after a successful build, or when you notice the context getting crowded.

Manual compaction has advantages over waiting for the automatic trigger:

You pick the timing — compact between tasks, not mid-thought
You guide the summary — /compact focus on the API refactor, discard the earlier debugging tells Claude what to preserve
You reclaim more space — compacting earlier gives you a much larger clean runway than waiting for the session to get extremely full

That said, starting a fresh session is almost always better than compacting if you're switching tasks. Reserve /compact for when you need to continue the same task but the context is getting heavy.

Batch related changes together

Every turn in a Claude Code conversation involves overhead: your prompt tokens, Claude's reasoning tokens, tool call tokens, and response tokens. Sending ten small one-line changes as ten separate prompts costs far more than describing all ten changes in a single well-structured prompt.

When you have multiple related changes — say, renaming a type and updating all its references — describe them together:

Wasteful: Ten separate prompts, one for each file that references the old type name
Efficient: "Rename the UserProfile type to Account in src/types.ts and update all imports and references across the codebase"

Claude handles batch operations well. A single prompt that describes the full scope of a change lets Claude plan and execute efficiently, reading each file once rather than re-establishing context on every turn.

Avoid pasting large code blocks

When you paste a 200-line code block into your prompt, those tokens go into context on top of any file reads Claude does. If Claude then reads the same file you just pasted, you've paid for it twice.

Instead of pasting code, reference it:

Use @-mentions to point at files
Reference specific line numbers or function names
Paste only the minimal snippet needed to illustrate a problem — a 5-line excerpt, not the whole file

The exception is code that doesn't exist in your project yet — design sketches, examples from documentation, or code from another project. In those cases, pasting is the only option, but keep it as short as possible.

Use CLAUDE.md to avoid repeating instructions

If you find yourself typing the same instructions in every session — "use single quotes," "run tests before committing," "follow the existing naming pattern in utils/" — put them in a CLAUDE.md file at the root of your project.

Claude Code reads CLAUDE.md automatically at session start. This means your instructions are loaded once, concisely, instead of being repeated (and expanded on) across dozens of prompts. The token savings compound over time:

Without CLAUDE.md: You type 50 tokens of instructions per session, across 20 sessions a day = 1,000 tokens/day of repeated instructions, plus Claude's acknowledgment of each
With CLAUDE.md: The file is loaded once per session automatically. You never retype the instructions. Claude never wastes tokens confirming it understands them.

Keep your CLAUDE.md focused and concise. Every line in it is loaded into every session, so don't pad it with explanations Claude doesn't need. Direct instructions work best: "Run npm test before committing. Use TypeScript strict mode. Prefer named exports."

Project vs. user-level CLAUDE.md

You can place CLAUDE.md at the project root for project-specific instructions, or in ~/.claude/CLAUDE.md for global preferences that apply to all projects. Use the project-level file for things like build commands, testing frameworks, and architecture decisions. Use the global file for personal coding style preferences and workflow rules.

Use .claudeignore to exclude irrelevant files

When Claude searches your project — looking for references, scanning for patterns, or exploring structure — it can stumble into directories that are large and irrelevant. Think node_modules, build output directories, generated files, test fixtures with large data files, or vendored dependencies.

A .claudeignore file works like .gitignore — it tells Claude Code which paths to skip. Create one at your project root:

node_modules/
dist/
build/
.next/
coverage/
*.min.js
*.generated.ts

Every file excluded from search results is a file Claude won't accidentally read. In a large project, this can save thousands of tokens per session by preventing Claude from wandering into generated code or dependency directories.

Track your usage

You can't optimize what you don't measure. AI Battery sits in your macOS menu bar and shows real-time token consumption, context window fullness, and rate limit status for your recent Claude Code sessions. It lets you spot patterns — which types of tasks burn through tokens fastest, when you're approaching limits, and whether your optimization efforts are actually working.

Pay attention to sessions that fill up unusually fast. They often reveal habits worth changing: a project missing a .claudeignore, a task that could be split into smaller sessions, or prompts that trigger unnecessary exploration.

See your token usage at a glance

AI Battery tracks token consumption, context health, and rate limit status for your Claude Code sessions — all from your menu bar. Free and open source.

Download AI Battery — Free

Quick reference

Here are the strategies ranked by impact, from highest to lowest token savings:

Start fresh sessions for new tasks — eliminates all accumulated noise from the previous task
Be specific with file references — prevents exploratory reads that can cost thousands of tokens each
Use .claudeignore — stops Claude from reading irrelevant files during searches
Use CLAUDE.md — replaces repeated instructions with a one-time load
Batch related changes — reduces per-turn overhead across multiple edits
Use /compact at natural breakpoints — reclaims context space before quality degrades
Name files by path in prompts — Claude reads them directly without searching
Keep prompts concise — sets the tone for focused, efficient responses
Reference files instead of pasting code — avoids duplicate content in context

Common questions

How do I know how many tokens a session has used?

AI Battery displays token counts for your recent Claude Code sessions directly in your menu bar. It breaks down input tokens, output tokens, and cache usage so you can see exactly where your budget is going. Claude Code itself also shows token counts after each response if you look at the status line, but AI Battery makes the data persistent and visible at a glance across sessions.

Does using a smaller model or "fast mode" reduce token usage?

Claude Code's fast mode uses the same model with faster output — it does not switch to a smaller model or use fewer tokens. The token consumption is comparable. The strategies in this guide apply equally regardless of whether fast mode is enabled.

Is there a hard limit on tokens per session, or just per time window?

Both. Each session has a context window of roughly 200K tokens — that's the per-session ceiling. Separately, Anthropic enforces rate limits across rolling 5-hour and 7-day time windows that cap your total usage across all sessions. Reducing per-session usage helps with both limits. See How Claude Code Rate Limits Work for the full rate limit breakdown.

Will these strategies make Claude's responses worse?

The opposite. Most of these strategies — being specific, keeping context focused, starting fresh sessions — actively improve response quality. Claude performs better with a clean, focused context than with a noisy one full of irrelevant file reads. You're not cutting corners; you're removing distractions.