What We Learned When We Sent AI Agents to Buy Insurance

We sent five AI agents to buy the same home insurance policy. Not five copies of the same chatbot, but five fundamentally different agent types, from ones that browse websites like a person to ones that read source code to ones that call APIs directly. Same task. Same day. What came back was… not what we expected.

Five Agents, Five Answers

Same policy. Same day. One insurer.

Browser-
Assistant

€13.36

Browser-
Automation

€11.62

Conversational
Assistant

€7.90

HTTP/API
Agent

REFUSED

Coding
Agent

REFUSED

Browser-
Assistant

Browser-
Automation

Conversational
Assistant

HTTP/API
Agent

Coding
Agent

Agent Type

Verified

Estimated

Refused

Same policy, same insurer, same day. Five agent types returned five different answers.

Finding One42% off. Not as a discount, but as a pricing error no one flagged.

One agent quoted €7.90 a month. The real price? €13.67. That's not a rounding error. That's 42% off, presented to the customer with a detailed methodology explaining exactly how it arrived at the number. Carefully reasoned. Confidently wrong. And your brand has no way to know it happened, because it happened inside a conversation you'll never see.

Finding TwoThe most responsible agents sent your customers to Check24.

Two agents refused to quote anything at all. Their logic was actually admirable: "We can't verify access to the pricing system, so giving you a number would be irresponsible." Hard to argue with that. Except here's what they said next: "Try a comparison portal instead."

Not because your brand failed. Because the agent couldn't find a machine-readable path to your pricing, so it sent the customer to someone who has one. The agent did exactly the right thing. An intermediary got the business.

For international readers: Check24 is Germany's dominant comparison portal. Every insurance executive in the DACH region knows it and fears the commission costs. Substitute whichever comparison aggregator your market dreads most. The dynamic is universal.

Finding ThreeTwo agents got the right answer. They still disagreed.

All we wanted was the truth. One verified quote. That's the whole point of sending an agent to a calculator: skip the guesswork, get the real number.

Two browser agents both made it through. Both navigated the same insurer's calculator, entered an identical profile, reached a real quote. And yet: €11.62 versus €13.36. A 15% gap on the same product. Each agent made slightly different assumptions about details the persona didn't specify. Age, deductible, payment frequency. One triggered a youth discount the other missed entirely. Neither was wrong, exactly. Both were right-ish. And a customer comparing those two numbers would have no idea they came from the same source.

Oh, and each one took about fifteen minutes to get there. Fifteen minutes of clicking, waiting, filling in forms, navigating dropdowns. To retrieve a number that the insurer's system could serve in milliseconds if anything were asking in the right way.

The problem isn't just that some agents fail. It's that even the ones that succeed tell different stories about your brand, and they burn fifteen minutes doing it.

Finding FourThe path already exists. No agent found it.

After watching agents spend fifteen minutes clicking through insurance calculators, we wanted to test the other end of the spectrum. What happens when the infrastructure is already in place? Not partially, not theoretically. Textbook-ready: a structured API, machine-readable product data, real-time inventory, instant checkout capability. Everything the insurance industry hasn't built yet.

We found it at an e-commerce brand. The platform provides a structured endpoint for every store out of the box. No clicking through menus, no parsing screenshots. Direct data access in milliseconds.

Not one of the five agents found it.

Every single one visited the website instead, burned through tokens, clicked through pages, and spent minutes doing what could have taken milliseconds. The result was the same as insurance: slow, inconsistent, fragile. Not because the infrastructure was missing, but because no agent knew it was there.

The bottleneck isn't building. It's being found. And right now, the discovery layer between agent-ready brands and the agents trying to reach them simply doesn't exist.

Finding FiveWhat worked last month broke this month.

It's a moving target. The models change, the agent harnesses change, the capabilities shift. We've seen success rates drop by more than half in a single version update. Not because the brand changed anything, but because the model did. What you fix for a browser agent breaks for a coding agent. What works on today's model confuses its successor next month.

This made something very clear to us: agent-ready is not a checkbox. To think agent-first, you have to treat it as a discipline. The same kind of continuous measurement and design that user experience demanded when it grew from a nice-to-have into a permanent function. Except faster, because the ground moves on someone else's schedule.

We're experimenting with new ways to solve this. Redesigning the funnel stage by stage. Making every step simpler, more direct, more machine-readable. If web design gave us "don't make me think," the agent era demands something sharper: don't make me compute.

Finding SixEfficiency is the new loyalty.

Agents don't have brand preferences. They have cost functions. Every screenshot a browser agent takes, every dropdown it navigates, every form field it guesses at is tokens burned. We watched agents take twenty-plus screenshots to complete a three-click flow and still get it wrong.

The agent that wins your customer's task won't be the one that tries hardest. It'll be the one that finds the cheapest path to a verified answer. Efficiency is the selection mechanism. The brand that offers a direct, agent-native path to its proprietary data doesn't just make the agent's job easier. It becomes the default answer.

The brands still forcing agents through human interfaces will watch their competitors get recommended instead. Not because the competitor is better. Because the competitor is cheaper to reach.

•

ConclusionThis entire article covers one stage. There are five.

Everything you just read is about getting a quote. Agents are already comparing and beginning to transact. The full arc from finding your brand to completing a purchase is five stages deep, and it's happening now.

The way brands are discovered, evaluated, and transacted with is being rewritten by machines talking to machines. There is no going back to a world where a human always reads your landing page.

Your brand already has an agent experience. You just haven't measured it yet.

We've spent the past year and a half on one question: what happens when AI agents meet real brands? Sending agent fleets across insurance, e-commerce, financial services, and industrial. Tracking what holds up and what breaks with every model update.

Everything we've learned is going into the DAX 40 Agent Readiness Index: the first systematic, ongoing measurement of how AI agents actually experience Germany's largest brands.

If you want to see where yours stands before the Index goes public, reach out.

hyperize.ai →

Marc Seefelder is co-founder of Hyperize, an Agent Experience (AX) platform that measures, engineers, and improves how AI agents find, cite, and interact with brands. Hyperize is building the DAX 40 Agent Readiness Index. It is a venture of MING Labs, a GenAI experience engineering firm.

Method

Method: Five agent types were dispatched to complete an identical home insurance quote task across three German insurers: a browser-assistant (AI-guided web browsing), a browser-automation agent (programmatic web navigation), a terminal/coding agent (source code and API probing), an HTTP/API agent (direct endpoint interaction), and a conversational assistant (synthesis from available sources). All agents received the same persona and task specification. Results were cross-verified against manual calculator runs. This study is part of Hyperize's ongoing Agent Readiness measurement programme across insurance, e-commerce, financial services, and industrial sectors.