How did Claude Opus 4.7 score on the Vals AI Finance Agent benchmark?

Claude Opus 4.7 leads the Vals AI Finance Agent benchmark at 64.4%, with Claude Sonnet 4.6 second at 63.3% and Muse Spark third at 60.6% as of the May 4, 2026 update. The benchmark uses 537 expert-authored questions testing SEC-filing analyst work with tool access.

Who are the partners in Anthropic's $1.5 billion finance joint venture?

Anthropic announced a $1.5 billion joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs to create a new enterprise AI services company. The JV targets mid-market embedding while Anthropic continues offering self-service tools to the largest institutions.

Which data providers connect to Anthropic's finance agents?

Anthropic added connectors to Dun & Bradstreet, Fiscal AI, Financial Modeling Prep, Guidepoint, IBISWorld, SS&C IntraLinks, Third Bridge, and Verisk, plus a Moody's MCP app covering credit data on more than 600 million companies. These provide governed data access inside the agent templates.

How accurate was Claude on AIG's insurance claims tasks?

AIG's CEO disclosed at the May 5 briefing that Claude scored 88% as accurate as a human expert on insurance claims out of the box. Anthropic used reinforcement learning specific to finance topics to train the agents, and that training does not include any client data.

Opinion AI Tools

Anthropic's Finance Agent Bet Is the Smarter Enterprise Play

Q: What are the ten Anthropic finance agent templates?

Anthropic released ten ready-to-run agent templates covering tasks such as pitchbook building, KYC screening, and month-end close, aimed at banking, insurance, asset management, and financial technology professionals. Each template ships as a plugin in Claude Cowork and Claude Code, and as a cookbook for Claude Managed Agents.

Anthropic shipped ten finance agents and a $1.5B JV the same day OpenAI partnered with PwC. The structural choice tells you who is winning Wall Street.

AnIntent Editorial

May 7, 2026

9 min read

Photo by Daniel Brzdęk on Unsplash

On the same May day OpenAI announced a forecasting and treasury partnership with PwC, Anthropic walked into a closed-door New York briefing with Jamie Dimon on stage and unveiled ten production-ready Anthropic finance AI agents, a Microsoft 365 integration, a $1.5 billion joint venture with three of the most powerful capital allocators on Wall Street, and a benchmark-topping model. The split-screen was not a coincidence. It is the clearest signal yet that the enterprise AI race is no longer about who has the best chatbot - it is about who controls the workflow plumbing inside regulated industries. Anthropic's bet is the smarter one, and the structure of that bet is the reason why.

The Distribution Problem OpenAI Has Not Solved

The headline numbers explain why Anthropic had to act now. According to Winbuzzer, Microsoft Copilot held 38.6% of enterprise AI usage share in February 2026, with OpenAI at 25.7%, while Anthropic's tool-use and workflows API climbed from 0% in January to 5.7% the following month. That is a steep slope, but it is also a small base. Without owning a desktop suite the way Microsoft does, Anthropic needs a different distribution wedge.

It found one inside Excel. Anthropic's announcement confirms Claude now works across Microsoft Excel, PowerPoint, and Word through Claude add-ins for Microsoft 365, with Outlook support coming soon, and context carries automatically between those applications once the add-ins are installed. That last detail matters more than the integrations themselves. A pitchbook draft that survives the trip from a Word memo to a PowerPoint deck without re-prompting is the kind of friction reduction that banking analysts will actually notice at 2 a.m.

The ten templates cover the unglamorous core of the industry: pitchbook building, KYC screening, and month-end close, according to Anthropic, with each template shipping as a plugin in Claude Cowork and Claude Code, and as a cookbook for Claude Managed Agents. These are not demos. They are the spreadsheet-and-PDF tasks that consume an associate's first three years.

A $1.5 Billion JV Is Not a Marketing Move

The most underrated piece of the announcement is structural. Fortune reported that Anthropic announced a $1.5 billion joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs to create a new enterprise AI services company, with the strategy splitting cleanly into two tracks: self-service tools for the largest institutions, and a private equity-backed JV for mid-market embedding.

Read that again. Anthropic is letting JPMorgan and Goldman build their own stacks on Claude while paying private equity to do the integration grunt work for everyone smaller. That is a recognition that mid-market banks, regional insurers, and asset managers under $50 billion in AUM cannot hire McKinsey to wire up an agent platform. They need someone who already owns them. Blackstone and H&F own a lot of them.

OpenAI's PwC tie-up, reported by PYMNTS the same day and focused on forecasting, planning, reporting, procurement, payments, and treasury, is the conventional answer: rent a Big Four consultancy and let it sell the model into existing relationships. The day before, OpenAI raised $4 billion to accelerate business AI adoption. That is a lot of money to spend out-marketing a competitor that just bought equity-aligned distribution into the entire mid-market.

The Benchmark Number Is Not the Whole Story

Claude Opus 4.7 currently leads the Vals AI Finance Agent benchmark at 64.4%, with Claude Sonnet 4.6 second at 63.3% and Muse Spark third at 60.6% as of the May 4 update. That is a real lead in a real benchmark - Vals built it from 537 expert-authored questions developed with banks, hedge funds, and Stanford researchers, and it tests actual SEC-filing analyst work with EDGAR access, document parsing, and retrieval tools.

But benchmarks plateau. Independent analysis from benchmark researcher Benjamin Prigent notes a roughly 60% ceiling appearing across FinanceQA, Vals AI, and MultiFinBen, with models stalling on reasoning, forecasting, and multi-step analysis even as they ace simple retrieval. A two-point lead at 64% is not a moat. The moat is what you build while the model is briefly ahead.

This is where the agent template architecture earns its keep. Each template packages three components - skills as domain instructions, connectors for governed data access, and subagents that are specialist Claude models for sub-tasks, per Anthropic. The connector list reads like a Bloomberg terminal autopsy: Dun & Bradstreet, Fiscal AI, Financial Modeling Prep, Guidepoint, IBISWorld, SS&C IntraLinks, Third Bridge, Verisk, plus a Moody's MCP app covering credit data on more than 600 million companies. That is the workflow Anthropic is renting out, not the model.

Wall Street's Data Vendors Already Got the Memo

The market reaction told its own story. Bloomberg reported that FactSet shares fell as much as 8.1% on the day of the announcement, Morningstar erased earlier gains to fall more than 3%, and S&P Global and Moody's also saw sharp selling pressure. Equity investors do not move that fast on a typical AI press release. They moved because the ten agents target professionals across banking, insurance, asset management, and financial technology - exactly the seats those data vendors sell into.

The irony is that Moody's is also a connector partner. The data businesses being repriced today are the same ones being plugged in as ingredients tomorrow. That is a preview of how this market reorganizes: the model layer commoditizes the analyst, the analyst's tools commoditize the data vendor, and the data vendor either becomes an MCP endpoint or becomes irrelevant.

The Counterargument: Lock-In Is a Real Cost

The strongest case against Anthropic's approach is also the most accurate one. Winbuzzer flagged that Claude Managed Agents may move orchestration logic into Anthropic's model layer, a design choice that simplifies rollouts but deepens lock-in concerns. A risk officer at a custody bank reading that sentence should pause.

Moving orchestration into the model means the prompts, tool-routing logic, and decision boundaries that define an agent's behavior live inside Anthropic's infrastructure rather than the customer's. Switching providers stops being a model swap and becomes a workflow rebuild. For a sector that spent the last decade fighting cloud lock-in clauses, this is a regression. It is also, almost certainly, why Anthropic Chief Commercial Officer Paul Smith framed the roadmap as a "staircase of autonomy" tracing finance AI from research assistance to full autonomy, per Fortune. Each step up the staircase is a step deeper into the stack.

The defense is that the alternative is worse. JPMorgan, in the same Fortune report, described organizational absorption - not the technology itself - as the harder challenge. Building bespoke orchestration in-house has a multi-year cost. Lock-in is a tax; absence is a wall.

Accuracy Numbers Banks Will Actually Quote in Board Decks

Two data points in the announcement matter more than the rest combined. AIG's CEO disclosed that Claude scored 88% as accurate as a human expert on insurance claims out of the box, Fortune reported. And Anthropic's chief economist Peter McCrory cited the company's proprietary Anthropic Economic Index showing AI is now used for at least a quarter of tasks in roughly half of all U.S. occupations.

The AIG number is the one that will be in every insurance CIO's deck by next quarter. "88% out of the box" is a phrase a board can act on. It is also, deliberately, a starting point - Anthropic used reinforcement learning specific to finance topics to train the agents, and that training does not include any client data, per Yahoo Finance. Customers can fine-tune from there without surrendering their book.

This is the part where the AI Tools story diverges from the AI Safety story. A claims-handling agent that is 88% as accurate as a human is not safe enough to deploy unsupervised. It is, however, more than accurate enough to triage the queue, draft the response, and hand the edge cases to a human. The economics of that split are punishing for headcount and excellent for margin.

Why the Two-Track Strategy Wins

Anthropic CFO Krishna Rao's line that "Enterprise demand for Claude is significantly outpacing any single delivery model" was reported by PYMNTS, and it is the most honest sentence in any of the briefings. A single delivery model is exactly what OpenAI is offering: ChatGPT for individuals, the API for developers, and a Big Four consultancy for the Fortune 500.

Anthropic's two-track structure - direct self-service for the giants who can build, plus a private-equity JV for the mid-market that cannot - fits how financial services actually buys software. The top fifty institutions want platforms they control. The next two thousand want a vendor who shows up. Goldman, Visa, Citi, and AIG are already on the customer roster, according to Yahoo Finance, with Nicholas Lin, head of Anthropic's financial services product work, saying Claude would develop "vertical-specific intelligence" in finance.

The sleeper detail is that Anthropic says the finance agents can handle complex, multi-hour tasks like deal closings with a full audit log, per Winbuzzer. Audit logs are what compliance officers approve. Audit logs are what regulators subpoena. Whichever vendor solves the audit-log problem first wins the regulated-industry decade - and that is the prize, not the chatbot. For more on how enterprise AI suites are bundling these capabilities, see our breakdown of Microsoft 365 E7 and Agent 365 and ongoing coverage in AI Infrastructure.

The Bet, Stated Plainly

Claude Opus 4.7's two-point lead on a finance benchmark will not survive the next OpenAI release. The ten templates and their connector graph will. The Microsoft 365 add-ins will. The Blackstone-H&F-Goldman JV will still be selling Claude into mid-market insurers in 2028 regardless of which model is briefly on top. Anthropic spent the May 5 briefing building distribution that compounds. OpenAI spent it announcing a partnership.

That is the smarter enterprise play. Not because the model is better, but because the model is the part everyone keeps copying.

Anthropic's Finance Agent Bet Is the Smarter Enterprise Play

The Distribution Problem OpenAI Has Not Solved

A $1.5 Billion JV Is Not a Marketing Move

The Benchmark Number Is Not the Whole Story

Wall Street's Data Vendors Already Got the Memo

The Counterargument: Lock-In Is a Real Cost

Accuracy Numbers Banks Will Actually Quote in Board Decks

Why the Two-Track Strategy Wins

The Bet, Stated Plainly

Frequently Asked Questions

Keep reading

OpenAI Releases GPT-5.5 With Agentic Computing Focus and Its Strongest Safety Safeguards Yet

Google Launches Gemini 3.1 Ultra With 2-Million Token Context Window and Native Multimodal Processing

Best AI Writing Assistants for 2026: Top Tools Compared for Writers, Students, and Professionals