What Claude Opus 4.7 and Sonnet 4.6 Changed About Building Document AI for Clients

I run a custom software company. We build production AI systems for businesses the kind of work where a client shows up with a painful, document-heavy process and asks whether we can automate it without breaking the parts that actually need a human. For the last couple of years, my honest answer has been “mostly yes, with caveats.” After spending real time with Claude Opus 4.7 and Sonnet 4.6, the answer is closer to “yes, and the caveats are small enough to quote against.”

I want to write down why, because I think a lot of founders and engineering leaders evaluating this space are still operating on a mental model that’s one or two model generations out of date. The economics of document AI have shifted, and it’s worth understanding what specifically shifted.

The pattern we keep seeing

Almost every client conversation that ends in a signed contract starts the same way. Someone on their team an appraiser, an underwriter, an adjuster, a paralegal, a clinical reviewer spends one to three days per file reading a long PDF and producing a structured artifact from it. A summary, a memo, a filled-out template, a report back to a bank or a regulator. The source is messy: text, tables, photos, scanned pages, the occasional handwritten note. The output is standardized. The work in between is mostly extraction and reformatting, with a thin layer of judgment at the end that genuinely requires a licensed human.

Every industry has some version of this. We’ve built variants of it property valuation summaries for commercial lenders, structured extractions from loan packages, report generation from inspection data and what I’ve learned is that the bottleneck was never the workflow design. It was always the model. Either the model couldn’t read the source reliably, or it couldn’t produce output structured enough to feed a template, or it couldn’t hold the whole document in its head, or it hallucinated values when it was uncertain. You’d solve one of those and trip over another.

The Claude 4.6 and 4.7 generation is the first time I’ve felt like all four problems got solved in the same release cycle.

What actually changed, from a builder’s perspective

I’ll skip the marketing summary and talk about the four things that actually change how we architect these systems.

Vision got good enough that we stopped pre-processing as aggressively. Opus 4.7 is the first Claude model with high-resolution image support, up to 2576 pixels on the long edge (3.75MP), versus the previous 1568px limit. That doesn’t sound like a big number until you’ve tried to read a dense comparables table or a construction inspector’s field photos with an older model. Small text, stamp marks, handwritten annotations, footnote references inside packed tables all the stuff that used to force us into a hybrid pipeline where OCR and layout tools did heavy lifting and the LLM did only interpretation. On Opus 4.7 we can hand the page image directly to the model and get back a faithful reading, with bounding-box-level localization improvements that matter for anything involving figures and tables. For a founder, that’s fewer moving parts, fewer vendor dependencies, and a simpler system to support in production.

The 1M-token context window became usable at normal prices. Both Sonnet 4.6 and Opus 4.7 support a 1M-token context window, and on Opus 4.7 it’s at standard API pricing with no long-context premium. A full commercial appraisal can run 150+ pages. A loan package with exhibits can run several hundred. A contract with schedules and annexes, similar. Before this generation, we’d chunk aggressively, run extraction per chunk, and reconcile at the end and every seam was a place for errors to creep in. Now the entire source document plus the extraction prompt plus the output schema fits in a single call. The simplification that creates in the code is hard to overstate. Our error surface collapsed.

Structured outputs with schema guarantees eliminated a whole category of defensive code. Structured outputs compile your JSON schema into a grammar that constrains the model’s output so it literally cannot emit a response that violates your schema. It is now generally available for Sonnet 4.5, Opus 4.5, and Haiku 4.5, and Opus 4.7 inherits it. Before this, every serious extraction system I’ve built had retry loops, JSON-repair logic, and a validator layer that existed purely to catch the model occasionally wrapping its output in markdown or adding a friendly preamble. That layer is gone from our stack now. I cannot tell you how much of a relief it is to delete it.

The model got more rigorous, and more honest about uncertainty. Opus 4.7 pays precise attention to instructions and devises ways to verify its own outputs before reporting back. It also interprets prompts more literally it won’t silently generalize one instruction to another and won’t infer requests you didn’t make. For document extraction specifically, this shows up as the model returning a clean “not present in the source” instead of a confidently wrong value when a field is missing or ambiguous. From a risk standpoint, when you’re selling into regulated industries where a hallucinated number can cost your client real money, this behavioral shift is worth more than any benchmark number.

How we build these systems

The architecture we keep landing on has three stages, and it’s worth naming them because the model choice per stage is where a lot of the unit economics live.

Stage one: analyze. The source PDF comes in, we extract text and layout, and we hand the model a combination of text plus page images where layout matters. This is the stage that benefits most from the vision and long-context improvements.

Stage two: extract. This is the core of the system. We define a schema covering every field the downstream artifact needs, and we run a prompt-driven extraction against the analyzed document using structured outputs. The output is guaranteed-valid JSON. This is where Opus 4.7 earns its keep accuracy on this step dominates everything else.

Stage three: generate. The validated JSON becomes merge data for a templated Word document, rendered with docxtpl or a similar Jinja-style Word template engine. This stage is pure code no model involvement needed because by the time we get here, we have trusted, schema-compliant data. A human reviewer then opens the generated document and either signs off or edits before it goes back to the end client.

The reviewer stays in the loop by design. We don’t sell our clients on “AI replaces your licensed professional.” We sell them on “your licensed professional stops doing the mechanical 80% of the work.” That framing matters commercially and ethically, and it’s the framing the current model generation finally makes deliverable.

On model mixing: we use Opus 4.7 for the extraction stage where accuracy dominates, and Sonnet 4.6 for lighter-touch stages document type classification, quality checks, short validation passes. Sonnet 4.6 is priced at $3/$15 per million tokens versus Opus 4.7 at $5/$25, and being able to route by stage rather than running the most expensive model everywhere is the difference between a system that pencils out at high volume and one that doesn’t.

Where this pattern fits

I’ve had enough intro calls in the last few months to say with confidence that this pattern travels across industries. The clients change, not the shape of the problem.

Banking and lending. Loan application packages, collateral reviews, credit memos, compliance summaries. Underwriters currently spend a material chunk of their week on extraction that doesn’t require underwriting judgment.

Commercial real estate. Appraisal summarization, property condition assessments, broker opinion packages, lease abstraction. Dense, long, visually rich source documents that currently get summarized by hand.

Insurance. Claims files, first-notice-of-loss packets, statements of value for commercial property, medical records for bodily injury review. Every adjuster I’ve spoken to describes the same “read the PDF, fill the form, then do the actual work” pattern.

Healthcare and life sciences. Prior authorization packets, clinical trial documentation, discharge summaries, explanation of benefits forms. The human-in-the-loop-on-the-final-artifact model is already how these industries operate, which makes this adoption pattern feel native rather than imposed.

Legal. Contract review, due diligence, regulatory filings, discovery summarization. The 1M context window is the unlock here full agreements plus exhibits in a single call.

Construction and engineering. Inspection reports, environmental assessments, as-built drawings, bid packages. The high-resolution vision matters more here than anywhere else because the drawings and photos are first-class content, not attachments.

Tax and public sector. Grant applications, tax filings, permit packages. Fixed schemas, variable source documents, template-driven outputs. Textbook fit.

I expect that within the next year, most mid-market firms in these industries will have at least piloted something shaped like this. The question for founders in my position isn’t whether the opportunity is real. It’s whether you’re building reusable components across verticals or rebuilding the same pipeline from scratch every time. We’ve been leaning hard toward the former.

What I’d tell a founder evaluating this space today

A few things worth saying plainly.

Structured outputs guarantee schema conformance, not factual correctness. The JSON will always parse. The values still need evaluation for high-stakes fields, and a review step with your client’s licensed professional is non-negotiable in most regulated industries. Don’t let a demo convince you otherwise.

Opus 4.7’s new tokenizer uses roughly 1x to 1.35x as many tokens as previous models on the same text. Your cost per document will not be a straight line from the pricing page. Test with real documents and budget conservatively.

The more literal instruction-following in Opus 4.7 is a net win for extraction pipelines, but prompts that quietly relied on the model “reading between the lines” may need rewriting. Budget prompt engineering time during migration, not just integration time.

And the meta-point: the moat in this space isn’t the model. The model is a commodity that gets better every quarter on a schedule we don’t control. The moat is the domain layer the industry-specific schemas, extraction prompts, validation rules, and templates that turn a general-purpose model into a production system for a specific industry. That’s where the engineering investment goes, and that’s what compounds.

The part worth repeating

Document understanding was “almost there” for two model generations. With Opus 4.7 and Sonnet 4.6 it crossed the line from useful assistant to reliable pipeline component. For a custom software company, that changes what you can credibly promise on a statement of work. A year ago, “we’ll automate 80% of this workflow in a defensible way” was an aspiration. Today it’s a deliverable.

If your business touches any process that looks like “read a long PDF, produce a structured artifact, hand it to a professional for sign-off,” we should probably talk.