VS Code 1.118's Token Cuts Make Copilot's Billing Switch a Harder Sell
VS Code 1.118 shipped two days after GitHub announced usage-based billing, and the timing isn't a coincidence - it's a confession.
AnIntent Editorial
Photo by Harshit Katiyar on Unsplash
Two days. That's the gap between GitHub announcing it would replace Copilot's flat premium request units with per-token billing, and Microsoft shipping a VS Code release engineered to slash how many tokens an agent actually burns. The timing is too tight to be coincidence, and it reframes the entire GitHub Copilot usage-based billing debate: if the editor team is racing to cut token consumption by double digits, the new pricing model isn't as developer-friendly as the announcement post insists.
GitHub's case for the change is straightforward on paper. VS Code 1.118 shipped April 29, exactly two days after the April 27 billing announcement, and the official GitHub blog post frames the move as transparency: subscribers will see input tokens, output tokens, and cached tokens itemized rather than abstracted into PRUs. The base subscriptions stay the same. What changes is what happens when you exceed your included credits, and how quickly an agentic workflow can get you there.
The Editor Is Quietly Subsidizing the New Pricing
Read the VS Code 1.118 release notes and a pattern emerges that the marketing copy avoids. The team placed cache breakpoints at the end of the system prompt, the end of tools, and the end of the most recent tool turn. Over 93% of each request reused from cache instead of billed as new input in long sessions, according to a technical breakdown by htek.dev. Cached tokens are billed at roughly a tenth of fresh input on Anthropic models in multi-turn sessions.
That 10x discount isn't a Microsoft invention. Anthropic's own pricing documentation confirms cache hits cost 10 percent of the standard input price. What VS Code 1.118 does is make sure Copilot actually uses that lever on every long agent loop, instead of paying full freight on context the model has already seen.
The VS Code 1.118 token efficiency work goes further. Tool Search Tool delivers up to 20% token savings; default for Claude Sonnet 4.5+ and Opus 4.5+; rolling out to GPT-5.4 and GPT-5.5 via Responses API, as Visual Studio Magazine reported. Smaller purpose-built models handle codebase search and terminal execution instead of frontier model, reducing per-task cost. Agentic execution tool capped at 10 terminal calls per invocation to prevent infinite loops and runaway token spend. Each of those changes is a guardrail against exactly the kind of bill shock that defines per-token billing.
Why This Order of Operations Matters
If the editor needed a 20 percent token cut, a 93 percent cache reuse rate, and a hard cap on terminal loops to make the math work, then the math without those changes was unfriendly. Subscribers on the previous PRU system never had to think about a runaway tool call. They paid a flat monthly fee and hit a request ceiling. The new system replaces a ceiling with a meter.
That shift exposes a problem the Anthropic pricing page makes explicit. Prompt caching uses the following pricing multipliers relative to base input token rates: Cache write tokens are charged when content is first stored. Cache read tokens are charged when a subsequent request retrieves the cached content. A cache hit costs 10% of the standard input price, which means caching pays off after just one cache read for the 5-minute duration (1.25x write), or after two cache reads for the 1-hour duration (2x write). Caching is only cheap when the cache actually hits. A misplaced breakpoint, a refactor that invalidates a system prompt, a model swap mid-session - any of these can flip the economics.
The Cursor precedent is instructive here. In June 2025, they replaced fixed "fast request" allotments with usage-based credit pools. The Pro plan ($20/month) gives you $20 in credits. That covers about 225 Claude Sonnet requests. For agent-heavy workflows, one developer reported $350 in overages in a single week. Cursor issued a public apology and refunds in July 2025 after the backlash. GitHub watched that happen in real time and is now running the same play, with better preparation.
The Subscriber Anger Is Already Visible
The community reaction tracks the Cursor pattern almost beat for beat. GitHub Copilot FAQ thread grew from 70 comments to 237 comments and 319 replies within days of announcement, per Visual Studio Magazine's reporting. That isn't a niche complaint. It's the loudest signal a developer audience can send before voting with their renewals.
GitHub's mitigations are real, and worth naming. The billing announcement confirms a preview bill experience launching in early May so users can model their costs before the June 1 cutover, and annual subscribers keep their PRU allotment until renewal - though model multipliers for those holdouts increase on June 1 anyway. Base prices for Pro at $10 a month, Pro+ at $39, Business at $19 per user, and Enterprise at $39 per user are unchanged at announcement.
The asymmetry is in what "unchanged" actually buys. A flat $10 plan that capped you at 300 premium requests was predictable. A flat $10 plan with GitHub AI Credits June 2026 that meter every input, output, and cache token is only predictable if you know your token shape, and most developers don't. That's the whole reason VS Code's 10-call terminal cap exists.
The Code Review Bill Hides the Real Cost
The most underreported piece of the Copilot billing change PRU story sits in a separate changelog. Starting June 1, Copilot code review billed two ways: AI Credits plus GitHub Actions minutes from plan entitlement. Public repository code reviews: Actions minutes remain free. Code review runs on agentic tool-calling architecture on GitHub Actions infrastructure. Organizations can set budget spending limits to cap Actions and Copilot spend, GitHub confirmed in its changelog.
That's two meters running on one feature. Token billing for the model. Compute billing for the runner. A team running automated reviews across a private monorepo now has to forecast both, and the only escape hatch is the spending limit, which works by cutting off the service when it triggers. For engineering managers, this is the single biggest change buried in the announcement, and it deserves more attention than the headline price table.
The Counterargument, Taken Seriously
The strongest defense of usage-based billing is that it's honest. PRUs were a fiction. A premium request that wrote a one-line fix cost the same as one that ran a 47-iteration agent loop, and Microsoft was eating the difference until it couldn't. Per-token billing maps cost to consumption, which is how every other AI API in production works.
That argument has weight. Anthropic's own pricing structure proves the model can be tuned for caching-heavy workloads. Anthropic's prompt caching gives a 90% discount on cached input tokens. If your agent has a system prompt, project context, or CLAUDE.md file that repeats on every call, caching means you pay full price once and 10% on every subsequent call. For a 5K-token system prompt sent across 200 calls, caching reduces the cost from 1M tokens (5K x 200) to 105K effective tokens (5K full + 5K x 199 x 0.1). That's an 89.5% reduction on the system prompt portion alone. A well-instrumented Copilot session can be cheaper under the new model than the old one.
The problem is who has to do the instrumenting. Under PRUs, GitHub absorbed the variance. Under credits, the developer absorbs it, and the only protection is a client-side editor that ships token-saving features faster than the average user can audit them. The shift isn't dishonest. It's a cost transfer dressed as transparency.
The Practical Read for the Next Sixty Days
For most individual subscribers, the math probably still works at $10 a month, especially if they upgrade VS Code promptly and let semantic indexing - now available in all workspaces, not just GitHub and Azure DevOps repos - narrow what the model has to load. Light users were never the target of the change.
The pressure point is the heavy agentic user. GitHub Copilot cost agentic workflows is the search query that will spike on June 1, because that's the cohort where token consumption can climb fastest. Anyone running multi-file refactors, automated reviews on private repos, or long debugging sessions with frontier models should be watching the preview bill in May, not waiting for the first invoice in July. For a fuller breakdown of which tier survives the transition best, the Copilot plans comparison walks through the entitlements line by line.
The broader pattern is worth noting in the AI Tools coverage on this site and across the wider Developer Tools beat. When a vendor ships an editor optimization the same week it changes a billing model, the optimization is the real product announcement. GitHub gets a cleaner P&L. Microsoft gets a stickier editor. Developers get a meter, a preview bill, and a 10-call cap that wasn't there a month ago. Whether that's a fair trade depends entirely on whether you trust the next VS Code release to keep cutting tokens as fast as your agents learn to spend them.
Frequently Asked Questions
GitHub announced the change on April 27, 2026, with the new GitHub AI Credits system replacing premium request units on June 1, 2026. Annual plan subscribers keep their PRU allotment until their plan expires, though model multipliers increase on June 1 regardless.
No. The base prices stay the same at announcement: Copilot Pro at $10 per month, Pro+ at $39 per month, Business at $19 per user per month, and Enterprise at $39 per user per month. What changes is how overages are calculated once included credits are spent.
The release places cache breakpoints at the end of the system prompt, end of tools, and end of the most recent tool turn, allowing over 93 percent of long-session requests to reuse cached context. The Tool Search Tool also delivers up to 20 percent token savings on Claude Sonnet 4.5 and newer Anthropic models.
No. Public repository code reviews still consume zero GitHub Actions minutes. Private repository code reviews will be billed two ways starting June 1: AI Credits for the model tokens plus GitHub Actions minutes drawn from your plan entitlement.
Yes. GitHub is launching a preview bill experience in early May so subscribers can project their projected usage costs before the transition. Organizations can also set budget spending limits to cap combined AI Credits and Actions spend automatically.