Agent Search Optimization: Why Latency and JSON-LD Accuracy Are the New SEO

Agent search optimization and SEO are not the same discipline. The cleanest way to see the difference is to start with a fact that punctures this post's own headline. Google states on the record that you need no special schema.org structured data, no AI text files, and no new markup to appear in AI Overviews or AI Mode. Its documentation says plainly that "there are no additional requirements to appear in AI Overviews or AI Mode, nor other special optimizations necessary," and that "you don't need to create new machine readable files, AI text files, or markup to appear in these features" (Google Search Central, AI features and your website). So the naive claim that "JSON-LD is the new SEO" is false by Google's own words. The real shift is not one markup tactic replacing another. It is the move from optimizing a page a human reads to maintaining a fact source a machine fetches. Latency and JSON-LD accuracy matter intensely, but as necessary conditions of that larger shift, not as the whole of it.

01

The reader became a software client

Traditional SEO optimized for a human at the end of the chain. Agent search inserts a software client in the middle. A retrieval agent fetches your page, parses it, and decides in real time whether your content is usable inside an answer. That single substitution changes the physics of three things at once, and only one of them is genuinely new.

The first is latency. For a human, page speed is a graded ranking factor and a comfort variable. For a fetcher operating under a timeout, latency behaves closer to a binary inclusion gate. If your server-rendered response does not arrive inside the window, you are not ranked lower. You are simply absent from the candidate set for that answer. Crawl economics reinforce this. Googlebot has long operated within a crawl budget, and live agent fetchers enforce timeouts of their own. The major vendors do not publish the exact millisecond thresholds, so the honest framing is directional. Slow is not penalized. Slow is excluded.

02

The bar moved from "ranks well" to "is correct"

The second change is the one most teams underestimate. Classic SEO rewards relevance and authority. Agent search adds a harder requirement: correctness. When a generative engine lifts your stated price, your opening hours, or a product specification into a synthesized answer, an error does not cost you a click. It gets repeated as a confident, sourced statement that you appear to have authored. A stale fact becomes a hallucination with your name on it.

There is a second-order risk too. Structured data that contradicts the visible page, or that overstates eligibility, undermines your rich-result eligibility rather than helping it. So accuracy is not a quality-of-life nicety. It is the difference between being cited correctly, being cited wrongly, or being dropped.

This is also where the field overclaims. The widely repeated idea that answer engines weight JSON-LD above visible prose during a live fetch is not established by any primary vendor documentation. It is lore. What is documented is narrower and more useful: server-side-rendered structured data is the reliable path, because behavior across agents varies and some fetch modes do not execute client-side JavaScript. Render your facts on the server, and you remove the question entirely.

03

What actually moves visibility in generative engines

If markup is not the lever Google describes, what is? The peer-reviewed origin of this field gives a concrete answer. The KDD 2024 paper "GEO: Generative Engine Optimization" by Aggarwal and colleagues introduced the GEO-bench benchmark and reported visibility gains of up to 40 percent in generative-engine responses (arXiv, Aggarwal et al., KDD 2024). The methods that produced those gains were not schema tricks. They were citations, quotations, and statistics. Statistically rich, well-cited prose. The gain is method-specific and domain-specific, not a universal number you can promise a client, and the paper is explicit that the lever is the writing, not the markup.

This sits comfortably beside Google's position. Structured data still earns rich results, and Google still recommends JSON-LD as the preferred format, easiest to implement and maintain at scale ahead of Microdata and RDFa (Google Search Central, Intro to structured data). But rich-results eligibility and AI-features eligibility are two different doors. JSON-LD opens the first. Correct, retrievable, well-evidenced content opens the second. Treating them as one thing is how teams end up polishing markup while the agent quotes a third-party aggregator instead.

04

The standards are still forming, so do not bet on folklore

A governance reframe requires knowing which rules are real and which are proposals dressed as rules. The Robots Exclusion Protocol is now standardized as RFC 9309 (2022), but it governs access only by allowing or disallowing URI paths (IETF RFC 9309). It cannot negotiate a data schema, verify the identity of a fetching agent, or enforce attribution. It is a blunt instrument for a job that now needs precision.

The proposals filling that gap are not yet standards. llms.txt, proposed by Jeremy Howard in September 2024, defines a markdown file at a site's root to provide LLM-friendly context. Its own specification states it is for inference time when a user seeks assistance, and explicitly not for search indexing like sitemaps nor for access control like robots.txt (llms.txt specification). No major search engine or LLM vendor has published primary documentation confirming they natively consume it.

Meanwhile the actual schema vocabulary keeps advancing. Schema.org shipped version 30.0 on 19 March 2026, adding a Credential class, an Error class, and equivalence exports for GS1, Dublin Core, and Open Graph, plus EU Digital Product Passport examples (schema.org Releases). The lesson is to build on documented behavior and ratified vocabulary, and to treat community proposals as optional hedges, not obligations.

05

Crawl became a governed, paid, two-sided market

The third change is the genuinely new one, and it is why agent-readiness is a governance question rather than a marketing tactic. On 1 July 2025, branded Content Independence Day, Cloudflare changed its default to block AI crawlers unless they pay creators for content, and launched Pay Per Crawl. In the same announcement Cloudflare reported that earning referral traffic from AI engines is far harder than from the old Google: roughly 750 times harder via OpenAI and roughly 30,000 times harder via Anthropic (Cloudflare Blog, Content Independence Day).

The crawler landscape also fragmented into purposes you now have to govern separately. OpenAI documents three distinct agents. OAI-SearchBot surfaces sites in ChatGPT search. GPTBot is used for model training. ChatGPT-User handles user-initiated live browsing. OpenAI states that ChatGPT-User "is not used for crawling the web in an automatic fashion," and that because its actions are user-initiated, "robots.txt rules may not apply" (OpenAI Developer Docs, Bots and crawlers). Read those two facts together and the implication is sharp. You can be paid-blocked at the edge for training, while a live user-initiated agent fetches you anyway, on a path your robots.txt does not govern. Access is no longer a single on-off switch. It is a market with different rules per purpose, and you are now a participant whether or not you decided to be.

06

Where Origin Pi fits

Put the three changes together and the headline resolves into a defensible thesis. Agent-readiness is not "add JSON-LD." It is publishing a fast, correct, server-rendered, access-controlled version of your business truth, so an agent cites you instead of hallucinating you or sourcing an aggregator that got your facts wrong. Latency and JSON-LD accuracy are necessary conditions of that. They are not the whole of it. The whole of it is governance: one source of business truth, rendered for machines, with access decided on purpose rather than by accident.

This is the layer Origin Pi builds. Our flagship, Cerebrum, is the governed business brain that holds a company's canonical facts and serves them to agents under control, so the version a machine fetches is the version you authored. On the demand side, our agent-readiness work makes those facts fast, correct, and server-rendered for the fetchers that now act as the first reader. On the visibility side, our marketing AI practice treats generative-engine presence as a governed publishing problem, not a markup hack. The audience changed from a human to a machine. The discipline that follows is governance, not folklore.

07

Sources

Questions

Common questions.

What is the difference between agent search optimization and SEO?

SEO optimizes a page for a human reader at the end of the chain. Agent search optimization prepares a fact source for a software client in the middle of the chain, a retrieval agent that fetches, parses, and decides in real time whether your content is usable inside a synthesized answer. The practical consequences are that latency behaves like a binary inclusion gate rather than a graded ranking factor, correctness matters as much as relevance because errors get repeated as sourced statements, and crawl access is now a governed, paid market rather than a single on-off switch.

Do I need special JSON-LD or schema.org markup to appear in Google AI Overviews?

No. Google states in its documentation that there are no additional requirements to appear in AI Overviews or AI Mode, and that you do not need new machine-readable files, AI text files, or special schema.org structured data. A page must simply be indexed and eligible to show in Google Search with a snippet. JSON-LD still earns rich results, which is a separate eligibility door from AI features, and Google still recommends JSON-LD as the preferred structured-data format for that purpose.

Does JSON-LD get weighted higher than visible text by answer engines?

There is no primary vendor documentation establishing that live retrieval weights JSON-LD above visible prose. That claim is field lore, not a documented fact. What is supported is narrower and more actionable: server-side-rendered structured data is the reliable path, because agent behavior varies and some fetch modes do not execute client-side JavaScript. Rendering your facts on the server removes the uncertainty entirely.

What actually improves visibility in generative engines?

The peer-reviewed GEO paper accepted at KDD 2024 reported visibility gains of up to 40 percent in generative-engine responses from methods such as adding citations, quotations, and statistics, meaning statistically rich, well-cited prose rather than markup tricks. That gain is method-specific and domain-specific, not a universal number, and it sits alongside Google's position that no special markup is required for AI features.

Should I add an llms.txt file to my site?

You can, but treat it as an optional hedge rather than an obligation. llms.txt is a community proposal from September 2024, not a ratified standard. Its own specification states it is meant for inference time when a user seeks assistance, and explicitly not for search indexing like sitemaps nor for access control like robots.txt. No major search engine or LLM vendor has published primary documentation confirming native consumption, so do not build a strategy on it.

How did AI crawling become a governance problem rather than a marketing tactic?

On 1 July 2025, Cloudflare changed its default to block AI crawlers unless they pay creators, and launched Pay Per Crawl, reporting that earning referral traffic from AI engines is far harder than from the old Google. At the same time crawlers fragmented by purpose. OpenAI documents separate bots for ChatGPT search, model training, and user-initiated live browsing, and notes that the user-initiated agent may not be bound by robots.txt. Access is now decided per purpose in a paid, two-sided market, which makes it a governance question.

Continue reading.

Building the agent-ready layer for your business? Send a note. Real reply, no funnel.

Talk to us Read the thesis