How to Optimise Content for LLM Citations and AI Search

If you’ve been anywhere near digital marketing in the past year, you’ve been asked some variation of this question: “How do we get our site cited in ChatGPT?” or “What do we need to do to appear in AI Overviews?” or my personal favourite, “Can you make us rank in Perplexity?”

It’s the question of the moment in SEO. And I get why, we’re watching search behaviour fundamentally shift. People are asking questions to AI systems instead of Google. Businesses are panicking about losing visibility. Agencies are scrambling to offer “AI optimisation” services. Everyone wants to know how to win in this new environment.

Here’s the uncomfortable truth I need to get out of the way early: we don’t have definitive causal evidence for most AI optimisation tactics.

I know that’s disappointing. You probably clicked on this article hoping for a checklist or a formula. But anyone selling you a guaranteed method for getting cited by LLMs is either lying to you or doesn’t understand the systems they’re claiming to optimise for. Or, far more likely, both. I mean, it isn’t like I can pretend to fully understand the ins and outs of systems put together over years by a load of nerds managed by tech bros. I’m sure there’s plenty in this article that will, in time, prove to be wrong, or will become outdated as LLMs and “AI” overviews change.

What we do have are correlation studies, observational research, some indirect effects we can measure, and a lot of educated guesses based on how these systems appear to work. That’s not nothing, it’s actually quite useful, but it’s not the same as having a proven playbook.

This article is going to be long (sorry). I’m going to walk through what we actually know with evidence, what we suspect based on reasonable inference, and what’s just industry-speculation dressed up as strategy. I’ll cover schema markup, the relationship between organic rankings and AI citations, query fan-out (which is probably the most important concept here, and has some interesting overlaps with the now-traditional SEO concept of Topical Authority, which even then doesn’t always stand up to detailed, long-term scrutiny), link building, technical SEO, and content strategy.

I’m also going to be honest about where the evidence is thin, where I’m making semi-educated guesses (just for Charlotte, I’m going to mention that I have an MA in Innovative and Experimental Literature, so my educated guesses come with a certificate and the smell of stale Guinness), and where we’re all just working in the dark. Because that honesty, that admission that we don’t have all the answers yet, is more valuable than confident-sounding nonsense.

Let’s start with the biggest misconception I keep encountering.

What We Actually Know (And What We Don’t)

The Evidence Problem

Here’s something that should concern anyone offering “LLM optimisation” services: there are no definitive, peer-reviewed, causal studies demonstrating that any specific optimisation tactic directly increases LLM citations.

None.

What we have instead are:

Correlation studies showing what types of pages tend to be cited more often
Observational research identifying patterns in AI-generated responses
Industry experiments with mixed and often contradictory results
A lot of marketing content making claims that aren’t supported by evidence

This matters because you’re potentially making significant decisions about your content strategy, technical infrastructure, and resource allocation based on speculation rather than proof. I’m not saying the speculation is worthless, some of it is quite well-reasoned, but let’s at least be clear about what we’re working with.

The research that does exist is largely observational. Studies examining which pages get cited in AI Overviews, or what characteristics those pages share, or how citation patterns correlate with traditional ranking factors. This can tell us “pages with X tend to be cited more often,” but it can’t tell us “X causes increased citations.”

The distinction matters. If schema-rich pages are cited more frequently (and they are, in some studies), is that because the schema itself influences citation, or because sites that implement schema tend to be higher quality overall? Similarly, are the sites with stronger backlink profiles mentioned more because of their strong backlink profiles, or because they typically tend to be larger, better-known brands. We don’t actually know.

The Indirect Effect Reality

The other thing you need to understand before we get into specifics is that, in most cases, LLMs don’t read your site directly.

When ChatGPT or Claude or Perplexity cites a source, they’re not crawling the web themselves. They’re relying on retrieval layers, search indices, retrieval-augmented generation (RAG) systems, curated corpora, to fetch documents first. The LLM then processes those retrieved documents and decides what to cite.

This has enormous implications for optimisation strategy. It means that traditional SEO factors, the things that determine whether your content gets indexed, retrieved, and surfaced by search systems, still matter enormously. Possibly more than any “AI-specific” optimisation you might do.

Your goal isn’t really to “optimise for LLMs.” Your goal is to make your content retrievable, clearly structured, and authoritative enough to be selected from the retrieval layer and cited in the synthesis step.

Think of it this way: if your page isn’t crawlable, isn’t indexed, or ranks on page 47 for relevant queries, no amount of “AI optimisation” will help. You’ve already lost. The LLM will never even see your content because it won’t make it through the retrieval layer.

Right, with those foundations established, let’s look at specific tactics and what the evidence actually tells us.

Schema and Structured Data: The Overhyped Solution

I’m going to start with schema because it’s probably the most overhyped aspect of “AI optimisation” right now. Every second article I read claims that structured data is the key to LLM citations. Some go further and suggest specific schema types will guarantee visibility in AI-generated answers.

It’s nonsense. Well-meaning nonsense in some cases, cynical marketing in others, but nonsense either way.

What the Evidence Actually Shows

Let’s be really clear about what we can and cannot say with evidence:

What we can say:

There is no definitive, causal evidence that implementing schema directly increases LLM citations
Most LLMs are trained primarily on unstructured text, not HTML or schema markup
LLMs without live retrieval don’t parse or “see” schema at inference time
This means schema cannot directly influence citations unless the system uses a retrieval layer

What we suspect (with some supporting correlation):

Systems using retrieval-augmented generation may benefit indirectly from structured data
Schema can help search indexes and retrievers better understand content
This could influence what documents are retrieved and later cited
Observational studies show correlation between schema-rich pages and higher citation frequency

But here’s the critical bit: those observational studies show correlation, not causation. Schema-rich pages tend to be cited more often, yes. But they also tend to be:

Higher quality overall
Better maintained
More authoritative
Clearer in structure
Written by people who understand technical SEO

This is called a selection effect. Sites that invest in proper schema implementation also tend to invest in everything else that makes content citable. Separating the schema effect from the “generally higher quality” effect is nearly impossible with observational data.

Industry experiments on this have been mixed at best. I’ve run tests where implementing schema made no measurable difference to AI citations. Other SEOs report marginal improvements. No one has produced a controlled study showing strong, isolated schema effects. (If you have, please send it to me, I’d genuinely love to see it.)

What Schema Actually Does

So if schema doesn’t guarantee LLM citations, why implement it? Because it does other valuable things:

Improves machine interpretability for search engines, this is explicitly stated by Google and other search engines
Reduces ambiguity and extraction errors, when systems do try to extract information, clear structure helps
Supports entity recognition, helping search engines understand who and what you are
Future-proofs your content, as AI systems become more sophisticated, structured data may become more important
Can improve traditional search features, rich results, knowledge panels, etc.

These are all legitimate reasons to implement schema. Just don’t do it expecting it to suddenly get you cited by ChatGPT, because that’s not what the evidence supports.

The Practical Stance on Schema

Here’s my actual recommendation: implement schema for the right reasons.

Focus on:

Article schema for content pieces (signals content type and date)
Organization schema for entity clarity (who you are)
Person schema for authorship (builds author authority)
FAQ schema if you have genuine FAQs (might help with retrieval, no guarantees)
Product schema for e-commerce (more for traditional search than AI citations)

Don’t bother with:

Implementing every possible schema type because you read it helps with AI
Complex nested schema that provides minimal clarity
Schema for the sake of schema

And please, for the love of all that’s holy, don’t pay someone a fortune to implement schema as an “AI optimisation strategy” unless they’re also fixing your fundamental SEO issues, improving your content quality, and building your authority. Because those things will have far more impact.

Bottom line on schema: It’s an enabler, not a driver. Worth doing, but not for the reasons most people think.

Organic Rankings and LLM Citations: The Correlation

Let’s move on to something we have better data on: the relationship between traditional organic rankings and AI citations.

What the Studies Actually Show

Multiple studies from 2024 have examined which pages get cited in AI-generated responses, particularly Google’s AI Overviews. The findings are interesting:

Strong correlation between high organic rankings and AI citations, pages that rank well organically are more likely to be cited
94% of AI Overviews cite at least one source from the top 20 organic results, this is a remarkably high overlap
Positions 1–3 account for a disproportionate share of citations, the top spots still matter
Moderate positive correlation between ranking in the top 10 and being cited (correlation coefficient around 0.347 in one study)

This all sounds good, right? If you rank well, you’re more likely to be cited. Traditional SEO still matters. Crisis averted.

But here’s where it gets interesting:

46–52% of AI citations come from pages outside the top organic results, that’s nearly half
Pages ranking outside the top 100 still get cited, not common, but it happens
Different AI systems show different patterns, what works for AI Overviews may not work for Perplexity or ChatGPT

What This Actually Means in Practice

The relationship between rankings and citations is real, but it’s not deterministic. High rankings increase your likelihood of being cited, but they don’t guarantee it. And more importantly, you can be cited without ranking particularly well for the primary query.

Why? Because of query fan-out, which we’ll get to in a moment. But the short version is: AI systems often answer questions by retrieving information for multiple related sub-queries, not just the original query. Your page might rank poorly for “what is schema markup” but rank well for “does schema help with SEO,” and get cited when someone asks the first question because the AI system expanded the query to include the second one.

This is actually good news if you understand it correctly. It means you don’t need to rank #1 for highly competitive terms to appear in AI-generated answers. You need to be authoritative, retrievable, and clear across a topic area.

But it’s also a trap if you misunderstand it. Some people interpret this as “rankings don’t matter for AI, so we can ignore traditional SEO.” That’s catastrophically wrong. Rankings still matter enormously, they’re just not the only thing that matters, and they’re not deterministic.

Bottom line on rankings: Traditional SEO fundamentals still apply. High rankings help significantly. But comprehensive topical coverage matters more than #1 rankings for single keywords.

Right, let’s talk about query fan-out, because this is probably the single most important concept for understanding how AI search actually works.

Query Fan-Out: The Actual Game-Changer

If you take nothing else from this article, understand query fan-out. It’s the key to understanding why AI search behaves differently from traditional search, and why your content strategy needs to evolve.

What Query Fan-Out Actually Is

Query fan-out is a retrieval technique where a single user query is expanded into multiple related sub-queries. Rather than searching for just what the user asked, the system:

Decomposes the query into component questions
Identifies related concepts and clarifications
Generates multiple sub-queries representing different facets of the intent
Retrieves information for each sub-query
Synthesises the results into a comprehensive answer

Here’s a simple example. If someone asks “How do I improve my SEO?”, a traditional search engine might return pages about “improving SEO.” A query fan-out system might decompose this into:

What is SEO?
What are the main SEO ranking factors?
How do I do keyword research?
What is technical SEO?
How do I build links?
What is content optimisation?
How long does SEO take?

The system then retrieves information for each of these sub-queries and builds an answer that addresses the original question comprehensively. Different parts of the answer might cite different sources, each retrieved for different sub-queries.

How This Changes Everything

Query fan-out fundamentally alters what “comprehensive content” means and how you should think about content strategy.

In traditional SEO, you might create one page targeting “how to improve SEO” and optimise it for that specific query and close variations. You’d try to rank that page as highly as possible for that term.

In a fan-out world, that same query might pull information from:

Your SEO fundamentals page
Your technical SEO guide
Your content strategy article
Your link building guide
Your keyword research tutorial

Each is retrieved because it comprehensively answers a specific sub-intent related to the original query. Your “how to improve SEO” page might not even be cited if it’s too broad or if other pages better address the specific sub-questions.

This is why I see sites with mediocre rankings for primary queries getting cited frequently in AI answers. They’re not winning on the primary query, they’re winning on the sub-queries generated through fan-out.

The Important Limitations

Before you get too excited about this, there are some important limitations to understand:

You can’t see the sub-queries. Major AI systems don’t publish what sub-queries they generate. It’s internal to their retrieval pipeline. You can approximate it using LLMs yourself (ask ChatGPT to “decompose this query into sub-questions”), but you can’t know exactly what any particular system is doing.

It only affects retrieval-enabled systems. Static LLMs without live web retrieval (like base ChatGPT without plugins) don’t do this in the same way. They work from their training data, not from live retrieval.

Fan-out patterns vary by system. Google AI Overviews, Perplexity, Bing Chat, and other systems likely use different decomposition strategies. There’s no universal pattern to optimise for.

You can’t game it. Because you can’t see the sub-queries and because they’re dynamically generated based on the original query, you can’t reverse-engineer a formula for appearing in fan-out results.

What You Can Do About It

So if you can’t see the sub-queries and can’t game the system, what’s the point of understanding fan-out?

The point is it changes how you think about content comprehensiveness. Rather than trying to rank one page for one query, you need to think about covering an intent space thoroughly. This means:

Creating content that addresses distinct sub-questions within a topic
Ensuring each piece can stand alone and answer a specific question clearly
Structuring content so individual sections can be retrieved independently
Building topical authority across a subject area, not just for individual keywords

We’ll get into the practical implementation of this in the content strategy section. For now, just understand that fan-out is why “comprehensive topical coverage” has become the watchword of modern SEO. It’s not just marketing speak, it’s a response to a fundamental change in how retrieval systems work.

Bottom line on fan-out: It rewards intent coverage across a topic area. You need to be authoritative on a subject, not just rank for a keyword.

Link Building for LLM Citations

Right, let’s talk about links. Because despite all the changes in search, link building remains one of the most important (and most misunderstood) aspects of digital strategy.

The Core Truth About Links and LLMs

First, the uncomfortable reality: backlinks do not directly signal LLMs.

LLMs don’t read PageRank. They don’t interpret anchor text. They don’t parse link graphs. When an LLM decides whether to cite your content, it’s not looking at how many backlinks you have or what your Domain Authority is. (I mean, there’s an argument to be made that they’re trained on sources that historically considered those factors, but that’s pretty indirect.)

So do links matter for LLM citations? Yes, but indirectly.

Here’s how: most LLM systems that cite sources rely on retrieval layers. Those retrieval layers are usually powered by search indices. And search indices definitely care about backlinks. Backlinks are one of the primary signals search engines use to determine authority, trustworthiness, and which pages deserve to rank.

Therefore, backlinks influence which pages are eligible to be retrieved and cited. They’re a prerequisite, not a direct ranking factor for LLM citations. They help you get into the retrieval pool. Once you’re in that pool, other factors determine whether you actually get cited.

This distinction matters for strategy. You can’t skip link building and expect AI systems to discover and cite your content. But pouring resources into manipulative link schemes won’t directly increase AI citations either.

Evidence-Based Link Hierarchy

So what types of links actually matter? Based on available evidence and understanding of how retrieval systems work, here’s the hierarchy:

Editorial / Naturally Earned Links (Highest Value)

These are links placed voluntarily by third parties because your content is genuinely valuable or authoritative.

Why they matter:

Strongest quality signal in SEO research
Consistently correlate with higher organic rankings
Increase retrieval likelihood in AI search systems
Frequently come with contextual brand mentions
Support entity recognition

For LLM citability:

Strong indirect boost through authority and retrieval
High trust signal
Often appear on the same types of authoritative sites AI systems prefer

Practical approach: Create genuinely useful content, conduct original research, develop unique tools or resources, and let the links come to you. I know that sounds idealistic, but it’s genuinely the most effective long-term strategy.

Digital PR / Media Coverage (Very High Value)

Links and mentions from news outlets, journalists, and media publications.

Why they matter:

High-authority endorsements from trusted sources
These domains are overrepresented in AI citations
Brand mentions matter even when links are nofollow or absent
Reinforce real-world credibility

For LLM citability:

Strong citation correlation (news sites are frequently cited)
Brand/entity reinforcement in AI’s training data and retrieval systems
Trusted source bias (AI systems prefer citing recognised publications)

Practical approach: Invest in proper digital PR. Respond to journalist requests, pitch newsworthy angles, provide expert commentary. This is one area where spending money typically provides good ROI, assuming you work with competent PR professionals.

Guest Posts (Moderate, Conditional Value)

Contributing articles to other sites in your industry.

Why they matter (conditionally):

Can provide value when published on relevant, authoritative sites
Help with topical association
But: effectiveness depends heavily on editorial standards and quality

Why they often don’t:

Search engines discount large-scale or low-quality guest posting
Weaker than earned editorial links
Generally seen as lower trust signal

For LLM citability:

Helpful for topical association if done well
Limited authority signal unless truly high-quality
Secondary to editorial and PR links

Practical approach: Guest post sparingly and only on genuinely authoritative, relevant sites. Focus on sites you’d be proud to be associated with. If you’re approaching it as a “link building tactic” rather than a genuine opportunity to reach a new audience, you’re probably doing it wrong.

Niche Edits / Link Inserts (Variable Value)

Adding links into existing content on third-party sites.

Why they can help:

Can work when placed in authoritative, topically aligned content
Contextual relevance is key

Why they often don’t:

Quality varies wildly
Poorly executed inserts are algorithmically devalued
From an AI perspective, only the surrounding content and source authority matter

For LLM citability:

Can help retrieval if contextually strong
Risky if low-quality
Inferior to natural editorial links

Practical approach: Use sparingly and only with high-quality placements. The link needs to make sense in context and add value to the existing content. If it looks like an awkward insertion (which it usually does), it won’t help much.

Directories, Resource Pages, Listings (Low-Moderate Value)

Business directories, resource page links, industry listings.

Why they matter (slightly):

Help with discovery and indexing
Useful in local or niche-specific contexts

For LLM citability:

Minor indirect benefit
Rarely cited sources themselves
More about ensuring you’re findable

Practical approach: Get the basics done (Google Business Profile, relevant industry directories), but don’t spend significant time or money here.

The Brand Mention Shift

Here’s something interesting that’s emerged from AI citation studies: brand mentions correlate more strongly with LLM citations than raw backlink counts.

This makes sense when you understand how these systems work. LLMs rely heavily on entity recognition and co-occurrence patterns in their training data. If your brand is mentioned frequently in trusted contexts, even without links, you’re building the kind of authority signals these systems recognise.

This is a significant shift from traditional link-focused SEO. An unlinked mention in a Guardian article might be more valuable for AI citability than a dozen links from mid-tier blogs. The mention itself builds entity authority and associative relevance.

Practically, this means:

Focus on getting mentioned, not just linked
Brand awareness campaigns have SEO value now (shocking, I know)
Quality and context of mentions matters more than quantity
Social media and PR mentions contribute to your overall authority profile

What Doesn’t Work (And Isn’t Supported by Evidence)

Let me save you some time and money by listing what we have no evidence for:

No proof that any backlink type is directly read or preferred by LLMs
No evidence that anchor text optimisation affects AI citations
No support for link volume outperforming relevance + authority
No indication that manipulative link tactics help AI visibility
No special “AI-focused” link building strategies that work differently

If someone tries to sell you on any of these, they’re either speculating or lying. Possibly both.

Practical Link Building Strategy for 2025

Given everything we know (and don’t know), here’s what a sensible link building approach looks like:

Prioritise editorial links and PR mentions over scalable tactics
Focus on being referenced by trusted publications, not just linked
Use guest posts sparingly and only on authoritative, relevant sites
Build entity authority through consistent brand presence
Remember: links are infrastructure for retrieval, not a direct citation lever

This might sound similar to traditional white-hat SEO advice. That’s because it is. The fundamentals haven’t changed as much as people think, we’re just more conscious now of why they matter.

Bottom line on links: Editorial links and brand mentions from authoritative sources. Everything else is secondary. No shortcuts, no tricks, just build genuine authority.

Right, let’s talk about technical SEO, because this is where a lot of misconceptions live.

Technical SEO for LLM Citations

I’m going to start this section with the conclusion, because it might save you reading the rest: There is no separate “technical SEO for LLMs.” There is only good technical SEO, applied in a world where retrieval and citation matter more than blue links.

If that disappoints you, I’m sorry. But it’s important to understand that LLMs aren’t some completely separate ecosystem requiring entirely new technical approaches. They rely on the same retrieval infrastructure that search engines use, which means the same technical fundamentals apply.

The Fundamental Principle

Let me repeat this because it’s important: LLMs don’t crawl the web themselves.

They rely on retrieval layers, search indices, RAG systems, curated corpora. Technical SEO determines whether your content makes it into those retrieval layers. If a page isn’t crawlable, isn’t indexed, or can’t be properly parsed, it cannot be cited. Full stop.

This means technical SEO isn’t optional. It’s not something you can skip while focusing on “AI-specific optimisations.” It’s the foundation that makes everything else possible.

Crawlability & Indexability (Non-Negotiable)

Your pages must be crawlable by search engines. This is table stakes.

What matters:

No robots.txt blocks on important content
No accidental noindex tags
Clean internal linking structure
Proper XML sitemaps
No orphaned pages

Why it matters for LLM citations: If search engines can’t crawl and index your content, retrieval systems can’t access it. No indexing = no retrieval = no citations. The chain breaks at the first step.

What to do: Run regular technical audits, fix crawl errors, ensure all important content is discoverable through internal links, monitor Google Search Console for indexing issues. The same things you should have been doing for traditional SEO.

Site Speed & Performance

Fast-loading pages are crawled more efficiently and more frequently.

What matters:

Core Web Vitals (especially LCP, FID, CLS)
Server response times
Efficient rendering
Minimal JavaScript blocking

Why it matters for LLM citations:

Improves crawl efficiency (more content discovered)
Reduces retrieval friction
Slow or unstable pages risk partial rendering or skipped indexing

What to do: Optimise images, implement proper caching, minimise render-blocking resources, use a CDN. Again, standard performance optimisation. Nothing AI-specific here.

Content Parsability & Machine Readability

This is where things get slightly different, not because LLMs need something special, but because passage-level retrieval makes structure more important.

What matters:

Clean, semantic HTML
Logical heading hierarchy (H1, H2, H3 used properly)
Lists, tables, and structured content where appropriate
Minimal heavy JavaScript rendering
Clear content segmentation

Why it matters for LLM citations:

Retrieval systems often work at the passage or chunk level
Well-structured content is easier to extract accurately
Clear segmentation allows different sections to be retrieved independently
Excessive JavaScript can prevent proper content extraction

What to do: Use semantic HTML5 elements, implement proper heading hierarchy, break content into clear sections with descriptive headings, ensure your content is readable even with JavaScript disabled. Think about how a chunk of your content would read in isolation, particularly the opening paragraph below your H1, which does disproportionate work for both readers and retrieval systems.

Structured Data & Metadata (Supportive, Not Required)

We covered schema earlier, but it’s worth repeating in the technical context.

What matters:

Schema for entity clarity (Organization, Person)
Article schema for content pieces
Clear, descriptive meta titles and descriptions
Author attribution
Publication dates

Why it matters for LLM citations:

Helps retrieval systems understand content semantics
Clear metadata improves retrieval confidence
Supports entity recognition
But: these are supporting signals, not deterministic factors

What to do: Implement relevant schema types, write clear meta descriptions, ensure proper authorship attribution. Don’t go overboard trying to implement every possible schema type.

Content Accessibility & Format

HTML is king. Other formats are harder to retrieve and extract from.

What matters:

Text-based HTML content
Transcripts for video/audio content
Alt text for images (for context, not just accessibility)
Clear language and direct answers
Content not hidden behind logins or paywalls

Why it matters for LLM citations:

Text is easiest to retrieve and extract
Clear, direct language improves passage selection
Accessible content = retrievable content

What to do: Favour HTML pages over PDFs when possible, add transcripts to video content, ensure key information isn’t locked behind interactions or authentication. Make your content as accessible and extractable as possible.

What’s Different from Traditional SEO (Sort Of)

There are a few shifts in emphasis worth noting:

Ranking does not equal citation. A page doesn’t need to rank #1 to be cited. It needs to be retrievable and useful for a specific sub-query. This changes your priority calculus slightly, it’s less about dominating one primary keyword and more about thorough topical coverage.

Passage-level relevance matters more. Because retrieval systems often work at the chunk level, how you structure individual sections matters. Each section should be clear and self-contained enough to be understood independently.

Trust, clarity, and attribution are weighted heavily. AI systems are risk-averse. They favour sources that appear authoritative, well-maintained, and clearly attributed. Ambiguity hurts you more in an AI citation context than it does in traditional search.

But these aren’t really new technical requirements. They’re traditional SEO best practices with slightly different emphasis.

What’s NOT Supported by Evidence

Let me be really clear about what you don’t need to do:

No special “LLM-only” technical optimisations exist
No AI-specific crawl directives override search indexing
No separate site architectures needed for LLMs
No hidden technical signals unique to AI citation systems
No magic tricks or shortcuts

If someone tries to sell you “technical AI SEO” that involves anything radically different from good traditional technical SEO, be very sceptical.

Practical Technical SEO Checklist

Here’s what actually matters:

Foundational (Do these first):

All important pages are crawlable and indexed
Site passes Core Web Vitals
Clean, semantic HTML structure
Proper heading hierarchy
Mobile-friendly and responsive
HTTPS throughout
XML sitemap submitted

Content Structure (Focus here):

Clear section breaks with descriptive headings
Important information not buried in JavaScript
Lists and tables for structured information
Each section can be understood independently
Direct answers to questions
Clear, concise language

Metadata & Attribution (Polish):

Relevant schema implemented
Clear author attribution
Publication/update dates visible
Descriptive meta information
Entity information complete

That’s it. That’s the technical checklist. It’s not exciting. It won’t make a good “10 Technical Tricks for AI Search” listicle. But it’s what the evidence actually supports.

Bottom line on technical SEO: The fundamentals matter more than ever. Make your content crawlable, fast, clearly structured, and easily extractable. Everything else is speculation.

Right, now let’s get to what might be the most important (and most misunderstood) part: content strategy.

Content Strategy: Comprehensiveness vs Cannibalisation

This is where everything comes together, and where I see the most confusion. People understand that “comprehensive content” matters, but they struggle with what that actually means in practice. More importantly, they’re worried about keyword cannibalisation: if I create lots of comprehensive content on related topics, won’t my pages compete with each other?

It’s a fair concern. And the answer is: it depends on whether you understand the difference between traditional cannibalisation and the kind of comprehensiveness that fan-out rewards.

The Core Shift in Content Strategy

Traditional SEO was about ranking pages for keywords. You’d identify a keyword, create a page for it, optimise that page, build links to it, and measure success by where it ranked for that specific term.

LLM-driven systems care about answering questions. They retrieve content for multiple sub-queries, synthesise information from multiple sources, and cite whatever best answers each component of the query.

Success in this environment isn’t just about ranking, it’s about being retrievable, usable, and citable across an intent space.

This is what people mean when they talk about Generative Engine Optimisation (GEO) or Answer Engine Optimisation (AEO). It’s not fundamentally different from good SEO, it’s good SEO with a different mental model. You’re optimising for comprehensive coverage of a topic, not just rankings for individual keywords.

How Query Fan-Out Changes Content Design

Remember query fan-out? It’s why content strategy has to evolve.

When a single user query expands into multiple sub-queries, your content gets evaluated against all of those sub-queries, not just the primary one. A system might decompose “How do I improve my site’s SEO?” into:

What are the main ranking factors?
How do I fix technical SEO issues?
What is content optimisation?
How important are backlinks?
How long does SEO take?
What tools do I need?

If your content only addresses the broad question without covering the sub-intents, you’re less likely to be retrieved for any of the component parts. But if you have distinct, clear answers to each sub-question (whether on one page or multiple pages), you have multiple opportunities to be retrieved and cited.

This is why “comprehensive content” has become such a focus. You’re not trying to rank for more keywords, you’re trying to cover the intent space thoroughly enough that you match various sub-queries generated through fan-out.

What “Comprehensive” Actually Means (This is Important)

Here’s where people get confused. Comprehensive content does not mean:

Repeating the same information across multiple pages
Creating slight variations of the same article
Targeting every possible keyword variant with a separate page
Making everything as long as possible
Covering unrelated topics just to have “more content”

Comprehensive content does mean:

Covering distinct sub-questions within a topic area
Addressing related intents that a user might have
Providing clear, self-contained answers that can be extracted independently
Building topical authority across a subject
Creating content at the right level of specificity for each intent

The difference matters. The first approach creates genuine cannibalisation and confuses both users and retrieval systems. The second approach builds authority and provides multiple entry points for fan-out queries.

Passage-Level Thinking

This is one of the practical shifts you need to make: think about your content at the passage level, not just the page level.

Many retrieval systems (including the ones powering AI search) work with chunks of content rather than full pages. They might extract a few paragraphs, or a single section, or even just a well-structured list. This means:

Each section of your content should make sense independently
Important information shouldn’t require reading the entire page to understand
Clear headings that describe what each section covers
Lists, tables, and structured elements that can be extracted cleanly
Direct answers to questions where appropriate

A well-structured page can satisfy multiple sub-queries because different sections can be retrieved for different purposes. A poorly structured page, even with great information, might never get retrieved at all because the extraction systems can’t reliably pull out clean, quotable passages.

Practical structure tips:

Use descriptive H2/H3 headings that could stand alone
Put key information early in each section
Use lists and tables for structured data
Include clear definitions and direct answers
Don’t bury important information deep in paragraphs

Traditional Cannibalisation vs Fan-Out Comprehensiveness

Right, let’s clear up the cannibalisation confusion once and for all.

Traditional SEO Cannibalisation:

Happens when multiple pages target the same primary intent
Causes ranking dilution (search engines don’t know which page to rank)
Creates SERP confusion (different pages appear at different times)
Is fundamentally a page-level ranking issue
Problem: multiple pages competing for the same query

Fan-Out Comprehensiveness:

Encourages coverage across different sub-intents
Rewards specialisation and depth on distinct topics
Each page serves a clear, separate purpose
Is a retrieval and synthesis challenge, not a ranking one
Solution: multiple pages addressing different aspects of a topic

The key distinction: Cannibalisation means competing for the same intent. Fan-out optimisation means covering different intents within a topic area.

You can have ten pages about SEO without cannibalisation if each addresses a distinct sub-topic:

What is SEO? (foundational definition)
Technical SEO guide (crawlability, site speed, structure)
Content optimisation (on-page factors, keyword research)
Link building strategies (backlinks, authority)
Local SEO (Google Business, local citations)
E-commerce SEO (product optimisation, structure)
SEO tools comparison (software and platforms)
SEO for specific industries (context-specific advice)
SEO mistakes to avoid (common issues)
How long does SEO take? (timeline expectations)

Each of these has a distinct primary intent. They might share some keywords naturally, but they’re not competing with each other, they’re supporting each other by building topical authority.

Where Comprehensiveness Can Go Wrong

That said, there is risk if you implement this poorly. You create actual cannibalisation problems when you:

Create multiple pages answering the same core question with slight variations
Rewrite the same content with different phrasing for “coverage”
Don’t establish clear hierarchy and relationships between pages
Create content without a defined, distinct intent
Target the same commercial keywords from multiple landing pages

This is still bad. It causes the same ranking issues it always did, and it now also confuses retrieval systems trying to figure out which of your similar pages to cite.

The Correct Model: Hub-and-Spoke

The structure that solves both traditional SEO concerns and fan-out optimisation is the hub-and-spoke model (also called topic clusters).

Hub Page:

Targets the broad, primary intent
Provides overview and context
Acts as the canonical ranking page for core terms
Links out to all supporting pages
Comprehensive but not exhaustive on any sub-topic

Spoke Pages:

Each targets a distinct, specific sub-intent
Goes deep on one aspect of the topic
Links back to the hub and to related spokes where relevant
Avoids duplicating the hub’s overview content
Can rank independently for specific queries

Benefits:

Prevents keyword cannibalisation (clear hierarchy)
Builds topical authority (comprehensive coverage)
Aligns with query fan-out (multiple entry points)
Works for traditional SEO (clear structure)
Works for AI retrieval (distinct, extractable content)

Example structure:

Hub: "Complete Guide to Technical SEO" (broad, authoritative, overview)
├── Spoke: "Website Crawlability: A Technical Guide"
├── Spoke: "Core Web Vitals Optimization"
├── Spoke: "XML Sitemaps: Implementation and Best Practices"
├── Spoke: "Structured Data and Schema Markup"
├── Spoke: "JavaScript SEO: Rendering and Indexing"
└── Spoke: "Site Speed Optimization Techniques"

Each spoke can be retrieved independently for specific sub-queries. The hub provides context and builds authority for the topic as a whole. There’s no duplication, no competition, just comprehensive coverage.

Intent Segmentation > Keyword Segmentation

This is the mindset shift you need to make: stop thinking about keywords, start thinking about intents.

Two pages can share keywords safely if their primary intent differs. The question isn’t “Do these pages target the same keywords?” The question is “Do these pages answer the same question?”

If the answer is yes, you have cannibalisation. If the answer is no, you have comprehensive topical coverage. It’s that simple.

Example of no cannibalisation (different intents):

“How to choose SEO tools” (comparative/decisional intent)
“Best SEO tools for small businesses” (recommendation/transactional)
“SEO tools comparison: Features and pricing” (detailed comparison)

These might all rank for variations of “SEO tools,” but they serve different purposes and different stages of the user journey.

Example of cannibalisation (same intent):

“How to improve your SEO”
“Ways to boost your search rankings”
“SEO improvement strategies”

These are all answering the same broad question with slight rephrasing. They’ll compete with each other and dilute your authority.

Practical Content Planning Rules

Here’s how to actually implement this:

Do:

Create one authoritative page per core intent
Build separate pages/sections for distinct sub-questions
Establish clear internal linking that shows page relationships
Write content to be quotable and extractable at the passage level
Map your content to user intents, not just keywords
Use hub-and-spoke structure for major topics
Ensure each page has a clear, distinct purpose

Don’t:

Create multiple pages answering the same main question
Rewrite near-duplicate content for “SEO coverage”
Build pages without clearly defined, distinct intent
Forget to establish hierarchy between related pages
Target the same commercial keywords from multiple landing pages
Create content just to create content

Metrics in the LLM Era

Traditional metrics still matter, don’t abandon them. But you should be thinking about additional indicators:

Traditional (still important):

Organic traffic
Rankings for target keywords
Conversion rates
Engagement metrics

LLM-era (increasingly relevant):

AI citations (track when possible)
Brand mentions in AI-generated answers
Coverage across conversational queries
Featured snippet and “People Also Ask” appearances
Topical authority signals (ranking across topic cluster)
Visibility without clicks (unfortunately hard to track)

The uncomfortable reality is that we don’t yet have great tools for measuring AI visibility consistently. But that’s the state of the industry right now, we’re all figuring this out together.

Bottom line on content strategy: Query fan-out rewards intent coverage, not content duplication. Cannibalisation only occurs when intent boundaries are unclear. Hub-and-spoke structure solves both traditional SEO concerns and AI retrieval challenges. Think in intents, not keywords.

What Actually Works: The Practical Summary

Right, we’ve covered a lot of ground. Let me bring it together into something actionable.

The Uncomfortable Truth

Most “AI optimisation” is just traditional SEO done well. There are no magic tricks. There are no shortcuts. There are no special technical configurations that suddenly make LLMs prefer your content.

What there is is a shift in emphasis, from ranking individual pages for individual keywords to building comprehensive topical authority that can be retrieved and cited across multiple sub-queries.

If that sounds less revolutionary than you were hoping for, well, welcome to SEO in 2025. The fundamentals haven’t changed as much as the headlines suggest. We’re just more conscious now of why they matter and how they work in retrieval-based systems.

The Real Priorities (In Order)

If you want to be cited by AI systems, here’s what actually matters:

1. Crawlability and Technical Health (Non-Negotiable)
Without this, nothing else matters. Your content must be crawlable, indexable, fast, and properly structured. This is table stakes.

2. Content Quality and Comprehensive Coverage (Most Important)
Cover your topic thoroughly. Address distinct sub-intents. Structure content for passage-level extraction. Build genuine topical authority. This is where most of your effort should go.

3. Authority and Trust Signals (Critical)
Editorial links, media mentions, brand presence, entity clarity. Build genuine authority through quality signals, not quantity of links.

4. Clear Structure and Hierarchy (Enabling Factor)
Hub-and-spoke content architecture. Logical internal linking. Clear intent segmentation. Make it easy for systems to understand what you’re about and how your content relates.

5. Structured Data and Schema (Supporting)
Implement for clarity and entity recognition. Don’t expect miracles, but do it properly for the right reasons.

6. Entity Building (Long-term Investment)
Consistent brand/author presence across platforms. Build the kind of entity authority that makes systems recognise you as a trusted source.

Notice what’s missing from this list: tricks, hacks, shortcuts, “AI-specific” technical configurations. It’s all fundamentals. Good SEO, applied thoughtfully.

What Not to Do

Save yourself time, money, and frustration:

Don’t create duplicate or near-duplicate content for “AI coverage”
Don’t buy into “LLM-specific” technical tricks without evidence
Don’t neglect traditional SEO fundamentals
Don’t expect quick wins or guaranteed results
Don’t implement recommendations you don’t understand
Don’t forget that ranking still matters
Don’t chase the algorithm instead of building something worth citing

Realistic Expectations

I need to be honest with you: we don’t have a proven playbook for AI optimisation because the systems are evolving faster than we can study them. What works today might not work in six months. What we think matters might turn out to be irrelevant.

This is frustrating. I know. But it’s honest.

What we can say with reasonable confidence is that the fundamentals, authority, clarity, comprehensiveness, technical health, will continue to matter. These things have mattered for traditional search, they matter for current AI systems, and they’ll likely matter for whatever comes next.

So rather than chasing specific AI optimisation tactics that might or might not work, focus on building a strong foundation: authoritative content, clear structure, genuine expertise, proper technical implementation. These things have lasting value regardless of how AI search evolves.

Conclusion

Look, I get the frustration. You came here hoping for a clear playbook, and what you’ve got instead is “do good SEO and understand query fan-out.” That’s probably disappointing.

But here’s the thing: anyone claiming to have a definitive, proven method for “ranking in LLMs” is either speculating based on limited data or actively misleading you. We’re all figuring this out as we go. The honest answer is that we don’t know exactly what works yet, because the systems are too new and too opaque for definitive answers.

What we do know is:

Traditional SEO fundamentals still matter enormously
Query fan-out is changing how content gets retrieved and cited
Comprehensive topical coverage beats keyword-focused optimisation
Authority signals (links, mentions, entity presence) still matter
Technical health is non-negotiable
Schema helps, but not for the reasons most people think

The shift isn’t toward some completely new discipline. It’s toward a slightly different mental model: thinking about comprehensive intent coverage rather than keyword rankings, understanding passage-level retrieval rather than page-level optimisation, building topical authority rather than chasing individual terms. If you want a structured approach to applying these principles, my generative engine optimisation service puts this methodology into practice.

That hub-and-spoke content model we discussed? That’s probably the most practical takeaway from this entire article. Structure your content around clear intent hierarchies, avoid duplication, cover sub-topics thoroughly, and make everything easily extractable. Do that well, and you’ll be in a better position than most sites regardless of how AI search evolves.

Is it exciting? No. Is it revolutionary? Not really. Does it work? Based on available evidence and logical inference, yes.

I know this article has been long (sorry, again). But I’d rather give you an honest, nuanced explanation than a confident-sounding listicle that oversimplifies a genuinely complex topic. The uncomfortable truth is that we’re all learning as we go, and anyone who tells you otherwise is selling something.

Focus on the fundamentals. Build genuine authority. Cover topics comprehensively. Make your content easily retrievable. And accept that we’re in a period of uncertainty where the playbook is still being written.

If you’ve got questions, if you disagree with something I’ve said, or if you’ve got data that contradicts what I’ve written here, please, reach out. We’re all figuring this out together, and honest discourse is more valuable than confident assertions.

How to Optimise Your Content for LLM Citations and AI Search

What We Actually Know (And What We Don’t)

The Evidence Problem

The Indirect Effect Reality

Schema and Structured Data: The Overhyped Solution

What the Evidence Actually Shows

What Schema Actually Does

The Practical Stance on Schema

Organic Rankings and LLM Citations: The Correlation

What the Studies Actually Show

What This Actually Means in Practice

Query Fan-Out: The Actual Game-Changer

What Query Fan-Out Actually Is

How This Changes Everything

The Important Limitations

What You Can Do About It

Link Building for LLM Citations

The Core Truth About Links and LLMs

Evidence-Based Link Hierarchy

Editorial / Naturally Earned Links (Highest Value)

Digital PR / Media Coverage (Very High Value)

Guest Posts (Moderate, Conditional Value)

Niche Edits / Link Inserts (Variable Value)

Directories, Resource Pages, Listings (Low-Moderate Value)

The Brand Mention Shift

What Doesn’t Work (And Isn’t Supported by Evidence)

Practical Link Building Strategy for 2025

Technical SEO for LLM Citations

The Fundamental Principle

Crawlability & Indexability (Non-Negotiable)

Site Speed & Performance

Content Parsability & Machine Readability

Structured Data & Metadata (Supportive, Not Required)

Content Accessibility & Format

What’s Different from Traditional SEO (Sort Of)

What’s NOT Supported by Evidence

Practical Technical SEO Checklist

Content Strategy: Comprehensiveness vs Cannibalisation

The Core Shift in Content Strategy

How Query Fan-Out Changes Content Design

What “Comprehensive” Actually Means (This is Important)

Passage-Level Thinking

Traditional Cannibalisation vs Fan-Out Comprehensiveness

Where Comprehensiveness Can Go Wrong

The Correct Model: Hub-and-Spoke

Intent Segmentation > Keyword Segmentation

Practical Content Planning Rules

Metrics in the LLM Era

What Actually Works: The Practical Summary

The Uncomfortable Truth

The Real Priorities (In Order)

What Not to Do

Realistic Expectations

Conclusion

Keep going.

Google’s WebMCP: What It Does, Why It Does It & Do You Need It?

Enhancing YMYL Content with Schema: A Practical Guide

Let's talk about yourgrowth goals.

Let's talk about your
growth goals.