Disruption
Agent Search Optimization: Why Latency and JSON-LD Accuracy Are the New SEO
Google says no special schema is required to appear in AI answers. So why does agent-readiness still come down to a fast, correct, server-rendered version of your business truth? Because the audience changed from a human reader to a software client, and that changes everything except the markup.
By Siddharth Surana, Founder & CEO / / 7 min read
Agent search optimization and SEO are not the same discipline. The cleanest way to see the difference is to start with a fact that punctures this post's own headline. Google states on the record that you need no special schema.org structured data, no AI text files, and no new markup to appear in AI Overviews or AI Mode. Its documentation says plainly that "there are no additional requirements to appear in AI Overviews or AI Mode, nor other special optimizations necessary," and that "you don't need to create new machine readable files, AI text files, or markup to appear in these features" (Google Search Central, AI features and your website). So the naive claim that "JSON-LD is the new SEO" is false by Google's own words. The real shift is not one markup tactic replacing another. It is the move from optimizing a page a human reads to maintaining a fact source a machine fetches. Latency and JSON-LD accuracy matter intensely, but as necessary conditions of that larger shift, not as the whole of it.
The reader became a software client
Traditional SEO optimized for a human at the end of the chain. Agent search inserts a software client in the middle. A retrieval agent fetches your page, parses it, and decides in real time whether your content is usable inside an answer. That single substitution changes the physics of three things at once, and only one of them is genuinely new.
The first is latency. For a human, page speed is a graded ranking factor and a comfort variable. For a fetcher operating under a timeout, latency behaves closer to a binary inclusion gate. If your server-rendered response does not arrive inside the window, you are not ranked lower. You are simply absent from the candidate set for that answer. Crawl economics reinforce this. Googlebot has long operated within a crawl budget, and live agent fetchers enforce timeouts of their own. The major vendors do not publish the exact millisecond thresholds, so the honest framing is directional. Slow is not penalized. Slow is excluded.
The bar moved from "ranks well" to "is correct"
The second change is the one most teams underestimate. Classic SEO rewards relevance and authority. Agent search adds a harder requirement: correctness. When a generative engine lifts your stated price, your opening hours, or a product specification into a synthesized answer, an error does not cost you a click. It gets repeated as a confident, sourced statement that you appear to have authored. A stale fact becomes a hallucination with your name on it.
There is a second-order risk too. Structured data that contradicts the visible page, or that overstates eligibility, undermines your rich-result eligibility rather than helping it. So accuracy is not a quality-of-life nicety. It is the difference between being cited correctly, being cited wrongly, or being dropped.
This is also where the field overclaims. The widely repeated idea that answer engines weight JSON-LD above visible prose during a live fetch is not established by any primary vendor documentation. It is lore. What is documented is narrower and more useful: server-side-rendered structured data is the reliable path, because behavior across agents varies and some fetch modes do not execute client-side JavaScript. Render your facts on the server, and you remove the question entirely.
What actually moves visibility in generative engines
If markup is not the lever Google describes, what is? The peer-reviewed origin of this field gives a concrete answer. The KDD 2024 paper "GEO: Generative Engine Optimization" by Aggarwal and colleagues introduced the GEO-bench benchmark and reported visibility gains of up to 40 percent in generative-engine responses (arXiv, Aggarwal et al., KDD 2024). The methods that produced those gains were not schema tricks. They were citations, quotations, and statistics. Statistically rich, well-cited prose. The gain is method-specific and domain-specific, not a universal number you can promise a client, and the paper is explicit that the lever is the writing, not the markup.
This sits comfortably beside Google's position. Structured data still earns rich results, and Google still recommends JSON-LD as the preferred format, easiest to implement and maintain at scale ahead of Microdata and RDFa (Google Search Central, Intro to structured data). But rich-results eligibility and AI-features eligibility are two different doors. JSON-LD opens the first. Correct, retrievable, well-evidenced content opens the second. Treating them as one thing is how teams end up polishing markup while the agent quotes a third-party aggregator instead.
The standards are still forming, so do not bet on folklore
A governance reframe requires knowing which rules are real and which are proposals dressed as rules. The Robots Exclusion Protocol is now standardized as RFC 9309 (2022), but it governs access only by allowing or disallowing URI paths (IETF RFC 9309). It cannot negotiate a data schema, verify the identity of a fetching agent, or enforce attribution. It is a blunt instrument for a job that now needs precision.
The proposals filling that gap are not yet standards. llms.txt, proposed by Jeremy Howard in September 2024, defines a markdown file at a site's root to provide LLM-friendly context. Its own specification states it is for inference time when a user seeks assistance, and explicitly not for search indexing like sitemaps nor for access control like robots.txt (llms.txt specification). No major search engine or LLM vendor has published primary documentation confirming they natively consume it.
Meanwhile the actual schema vocabulary keeps advancing. Schema.org shipped version 30.0 on 19 March 2026, adding a Credential class, an Error class, and equivalence exports for GS1, Dublin Core, and Open Graph, plus EU Digital Product Passport examples (schema.org Releases). The lesson is to build on documented behavior and ratified vocabulary, and to treat community proposals as optional hedges, not obligations.
Crawl became a governed, paid, two-sided market
The third change is the genuinely new one, and it is why agent-readiness is a governance question rather than a marketing tactic. On 1 July 2025, branded Content Independence Day, Cloudflare changed its default to block AI crawlers unless they pay creators for content, and launched Pay Per Crawl. In the same announcement Cloudflare reported that earning referral traffic from AI engines is far harder than from the old Google: roughly 750 times harder via OpenAI and roughly 30,000 times harder via Anthropic (Cloudflare Blog, Content Independence Day).
The crawler landscape also fragmented into purposes you now have to govern separately. OpenAI documents three distinct agents. OAI-SearchBot surfaces sites in ChatGPT search. GPTBot is used for model training. ChatGPT-User handles user-initiated live browsing. OpenAI states that ChatGPT-User "is not used for crawling the web in an automatic fashion," and that because its actions are user-initiated, "robots.txt rules may not apply" (OpenAI Developer Docs, Bots and crawlers). Read those two facts together and the implication is sharp. You can be paid-blocked at the edge for training, while a live user-initiated agent fetches you anyway, on a path your robots.txt does not govern. Access is no longer a single on-off switch. It is a market with different rules per purpose, and you are now a participant whether or not you decided to be.
Where Origin Pi fits
Put the three changes together and the headline resolves into a defensible thesis. Agent-readiness is not "add JSON-LD." It is publishing a fast, correct, server-rendered, access-controlled version of your business truth, so an agent cites you instead of hallucinating you or sourcing an aggregator that got your facts wrong. Latency and JSON-LD accuracy are necessary conditions of that. They are not the whole of it. The whole of it is governance: one source of business truth, rendered for machines, with access decided on purpose rather than by accident.
This is the layer Origin Pi builds. Our flagship, Cerebrum, is the governed business brain that holds a company's canonical facts and serves them to agents under control, so the version a machine fetches is the version you authored. On the demand side, our agent-readiness work makes those facts fast, correct, and server-rendered for the fetchers that now act as the first reader. On the visibility side, our marketing AI practice treats generative-engine presence as a governed publishing problem, not a markup hack. The audience changed from a human to a machine. The discipline that follows is governance, not folklore.
Sources
- Google states no special schema, AI text files, or markup is required to appear in AI Overviews or AI Mode; a page must simply be indexed and snippet-eligible.
- Google recommends JSON-LD as the preferred structured-data format for rich results, ahead of Microdata and RDFa; rich-results eligibility is distinct from AI-features eligibility.
- The KDD 2024 GEO paper by Aggarwal et al. introduced GEO-bench and reported visibility gains of up to 40 percent from methods such as adding citations, quotations, and statistics; the gain is method- and domain-specific.
- schema.org released version 30.0 on 19 March 2026, adding a Credential class, an Error class with errorCode, and equivalence exports for GS1, Dublin Core, and Open Graph plus EU Digital Product Passport examples.
- llms.txt is a community proposal published 3 September 2024, meant for inference time and explicitly not for search indexing or access control; no major vendor documents native consumption.
- On 1 July 2025, Cloudflare defaulted to blocking AI crawlers without compensation and launched Pay Per Crawl, reporting AI referral traffic is roughly 750x harder via OpenAI and 30,000x harder via Anthropic than the old Google.
- OpenAI documents OAI-SearchBot (ChatGPT search), GPTBot (training), and ChatGPT-User (user-initiated live browsing), noting robots.txt rules may not apply to the user-initiated agent.
- The Robots Exclusion Protocol is standardized as RFC 9309 (2022) and governs access only by allowing or disallowing URI paths; it cannot negotiate schemas, verify agent identity, or enforce attribution.
Common questions.
What is the difference between agent search optimization and SEO?
Do I need special JSON-LD or schema.org markup to appear in Google AI Overviews?
Does JSON-LD get weighted higher than visible text by answer engines?
What actually improves visibility in generative engines?
Should I add an llms.txt file to my site?
How did AI crawling become a governance problem rather than a marketing tactic?
Where this connects.
Continue reading.
Work with Origin Pi.
Building the agent-ready layer for your business? Send a note. Real reply, no funnel.