If you’ve been anywhere near digital marketing in the past year, you’ve been asked some variation of this question: “How do we get our site cited in ChatGPT?” or “What do we need to do to appear in AI Overviews?” or my personal favourite, “Can you make us rank in Perplexity?”
It’s the question of the moment in SEO. And I get why, we’re watching search behaviour fundamentally shift. People are asking questions to AI systems instead of Google. Businesses are panicking about losing visibility. Agencies are scrambling to offer “AI optimisation” services. Everyone wants to know how to win in this new landscape.
Here’s the uncomfortable truth I need to get out of the way early: we don’t have definitive causal evidence for most AI optimisation tactics.
I know that’s disappointing. You probably clicked on this article hoping for a checklist or a formula. But anyone selling you a guaranteed method for getting cited by LLMs is either lying to you or doesn’t understand the systems they’re claiming to optimise for. Or, far more likely, both. I mean, it isn’t like I can pretend to fully understand the ins and outs of systems put together over years by a load of nerds managed by tech bros. I’m sure there’s plenty in this article that will, in time, prove to be wrong, or will become outdated as LLMs and “AI” overviews change.
What we do have are correlation studies, observational research, some indirect effects we can measure, and a lot of educated guesses based on how these systems appear to work. That’s not nothing, it’s actually quite useful, but it’s not the same as having a proven playbook.
This article is going to be long (sorry). I’m going to walk through what we actually know with evidence, what we suspect based on reasonable inference, and what’s just industry-speculation dressed up as strategy. I’ll cover schema markup, the relationship between organic rankings and AI citations, query fan-out (which is probably the most important concept here, and has some interesting overlaps with the now-traditional SEO concept of Topical Authority, which even then doesn’t always stand up to detailed, long-term scrutiny), link building, technical SEO, and content strategy.
I’m also going to be honest about where the evidence is thin, where I’m making semi-educated guesses (just for Charlotte, I’m going to mention that I have an MA in Innovative and Experimental Literature, so my educated guesses come with a certificate and the smell of stale Guinness), and where we’re all just working in the dark. Because that honesty, that admission that we don’t have all the answers yet, is more valuable than confident-sounding nonsense.
Let’s start with the biggest misconception I keep encountering.
What We Actually Know (And What We Don’t)
The Evidence Problem
Here’s something that should concern anyone offering “LLM optimisation” services: there are no definitive, peer-reviewed, causal studies demonstrating that any specific optimisation tactic directly increases LLM citations.
None.
What we have instead are:
- Correlation studies showing what types of pages tend to be cited more often
- Observational research identifying patterns in AI-generated responses
- Industry experiments with mixed and often contradictory results
- A lot of marketing content making claims that aren’t supported by evidence
This matters because you’re potentially making significant decisions about your content strategy, technical infrastructure, and resource allocation based on speculation rather than proof. I’m not saying the speculation is worthless, some of it is quite well-reasoned, but let’s at least be clear about what we’re working with.
The research that does exist is largely observational. Studies examining which pages get cited in AI Overviews, or what characteristics those pages share, or how citation patterns correlate with traditional ranking factors. This can tell us “pages with X tend to be cited more often,” but it can’t tell us “X causes increased citations.”
The distinction is crucial. If schema-rich pages are cited more frequently (and they are, in some studies), is that because the schema itself influences citation, or because sites that implement schema tend to be higher quality overall? Similarly, are the sites with stronger backlink profiles mentioned more because of their strong backlink profiles, or because they typically tend to be larger, better-known brands. We don’t actually know.
The Indirect Effect Reality
The other thing you need to understand before we get into specifics is that, in most cases, LLMs don’t read your site directly.
When ChatGPT or Claude or Perplexity cites a source, they’re not crawling the web themselves. They’re relying on retrieval layers, search indices, retrieval-augmented generation (RAG) systems, curated corpora, to fetch documents first. The LLM then processes those retrieved documents and decides what to cite.
This has enormous implications for optimisation strategy. It means that traditional SEO factors, the things that determine whether your content gets indexed, retrieved, and surfaced by search systems, still matter enormously. Possibly more than any “AI-specific” optimisation you might do.
Your goal isn’t really to “optimise for LLMs.” Your goal is to make your content retrievable, clearly structured, and authoritative enough to be selected from the retrieval layer and cited in the synthesis step.
Think of it this way: if your page isn’t crawlable, isn’t indexed, or ranks on page 47 for relevant queries, no amount of “AI optimisation” will help. You’ve already lost. The LLM will never even see your content because it won’t make it through the retrieval layer.
Right, with those foundations established, let’s look at specific tactics and what the evidence actually tells us.
Schema and Structured Data: The Overhyped Solution
I’m going to start with schema because it’s probably the most overhyped aspect of “AI optimisation” right now. Every second article I read claims that structured data is the key to LLM citations. Some go further and suggest specific schema types will guarantee visibility in AI-generated answers.
It’s nonsense. Well-meaning nonsense in some cases, cynical marketing in others, but nonsense either way.
What the Evidence Actually Shows
Let’s be really clear about what we can and cannot say with evidence:
What we can say:
- There is no definitive, causal evidence that implementing schema directly increases LLM citations
- Most LLMs are trained primarily on unstructured text, not HTML or schema markup
- LLMs without live retrieval don’t parse or “see” schema at inference time
- This means schema cannot directly influence citations unless the system uses a retrieval layer
What we suspect (with some supporting correlation):
- Systems using retrieval-augmented generation may benefit indirectly from structured data
- Schema can help search indexes and retrievers better understand content
- This could influence what documents are retrieved and later cited
- Observational studies show correlation between schema-rich pages and higher citation frequency
But here’s the critical bit: those observational studies show correlation, not causation. Schema-rich pages tend to be cited more often, yes. But they also tend to be:
- Higher quality overall
- Better maintained
- More authoritative
- Clearer in structure
- Written by people who understand technical SEO
This is called a selection effect. Sites that invest in proper schema implementation also tend to invest in everything else that makes content citable. Separating the schema effect from the “generally higher quality” effect is nearly impossible with observational data.
Industry experiments on this have been mixed at best. I’ve run tests where implementing schema made no measurable difference to AI citations. Other SEOs report marginal improvements. No one has produced a controlled study showing strong, isolated schema effects. (If you have, please send it to me, I’d genuinely love to see it.)
What Schema Actually Does
So if schema doesn’t guarantee LLM citations, why implement it? Because it does other valuable things:
- Improves machine interpretability for search engines, this is explicitly stated by Google and other search engines
- Reduces ambiguity and extraction errors, when systems do try to extract information, clear structure helps
- Supports entity recognition, helping search engines understand who and what you are
- Future-proofs your content, as AI systems become more sophisticated, structured data may become more important
- Can improve traditional search features, rich results, knowledge panels, etc.
These are all legitimate reasons to implement schema. Just don’t do it expecting it to suddenly get you cited by ChatGPT, because that’s not what the evidence supports.
The Practical Stance on Schema
Here’s my actual recommendation: implement schema for the right reasons.
Focus on:
- Article schema for content pieces (signals content type and date)
- Organization schema for entity clarity (who you are)
- Person schema for authorship (builds author authority)
- FAQ schema if you have genuine FAQs (might help with retrieval, no guarantees)
- Product schema for e-commerce (more for traditional search than AI citations)
Don’t bother with:
- Implementing every possible schema type because you read it helps with AI
- Complex nested schema that provides minimal clarity
- Schema for the sake of schema
And please, for the love of all that’s holy, don’t pay someone a fortune to implement schema as an “AI optimisation strategy” unless they’re also fixing your fundamental SEO issues, improving your content quality, and building your authority. Because those things will have far more impact.
Bottom line on schema: It’s an enabler, not a driver. Worth doing, but not for the reasons most people think.
Organic Rankings and LLM Citations: The Correlation
Let’s move on to something we have better data on: the relationship between traditional organic rankings and AI citations.
What the Studies Actually Show
Multiple studies from 2024 have examined which pages get cited in AI-generated responses, particularly Google’s AI Overviews. The findings are interesting:
- Strong correlation between high organic rankings and AI citations, pages that rank well organically are more likely to be cited
- 94% of AI Overviews cite at least one source from the top 20 organic results, this is a remarkably high overlap
- Positions 1–3 account for a disproportionate share of citations, the top spots still matter
- Moderate positive correlation between ranking in the top 10 and being cited (correlation coefficient around 0.347 in one study)
This all sounds good, right? If you rank well, you’re more likely to be cited. Traditional SEO still matters. Crisis averted.
But here’s where it gets interesting:
- 46–52% of AI citations come from pages outside the top organic results, that’s nearly half
- Pages ranking outside the top 100 still get cited, not common, but it happens
- Different AI systems show different patterns, what works for AI Overviews may not work for Perplexity or ChatGPT
What This Actually Means in Practice
The relationship between rankings and citations is real, but it’s not deterministic. High rankings increase your likelihood of being cited, but they don’t guarantee it. And more importantly, you can be cited without ranking particularly well for the primary query.
Why? Because of query fan-out, which we’ll get to in a moment. But the short version is: AI systems often answer questions by retrieving information for multiple related sub-queries, not just the original query. Your page might rank poorly for “what is schema markup” but rank well for “does schema help with SEO,” and get cited when someone asks the first question because the AI system expanded the query to include the second one.
This is actually good news if you understand it correctly. It means you don’t need to rank #1 for highly competitive terms to appear in AI-generated answers. You need to be authoritative, retrievable, and clear across a topic area.
But it’s also a trap if you misunderstand it. Some people interpret this as “rankings don’t matter for AI, so we can ignore traditional SEO.” That’s catastrophically wrong. Rankings still matter enormously, they’re just not the only thing that matters, and they’re not deterministic.
Bottom line on rankings: Traditional SEO fundamentals still apply. High rankings help significantly. But comprehensive topical coverage matters more than #1 rankings for single keywords.
Right, let’s talk about query fan-out, because this is probably the single most important concept for understanding how AI search actually works.
Query Fan-Out: The Actual Game-Changer
If you take nothing else from this article, understand query fan-out. It’s the key to understanding why AI search behaves differently from traditional search, and why your content strategy needs to evolve.
What Query Fan-Out Actually Is
Query fan-out is a retrieval technique where a single user query is expanded into multiple related sub-queries. Rather than searching for just what the user asked, the system:
- Decomposes the query into component questions
- Identifies related concepts and clarifications
- Generates multiple sub-queries representing different facets of the intent
- Retrieves information for each sub-query
- Synthesises the results into a comprehensive answer
Here’s a simple example. If someone asks “How do I improve my SEO?”, a traditional search engine might return pages about “improving SEO.” A query fan-out system might decompose this into:
- What is SEO?
- What are the main SEO ranking factors?
- How do I do keyword research?
- What is technical SEO?
- How do I build links?
- What is content optimisation?
- How long does SEO take?
The system then retrieves information for each of these sub-queries and builds an answer that addresses the original question comprehensively. Different parts of the answer might cite different sources, each retrieved for different sub-queries.
How This Changes Everything
Query fan-out fundamentally alters what “comprehensive content” means and how you should think about content strategy.
In traditional SEO, you might create one page targeting “how to improve SEO” and optimise it for that specific query and close variations. You’d try to rank that page as highly as possible for that term.
In a fan-out world, that same query might pull information from:
- Your SEO fundamentals page
- Your technical SEO guide
- Your content strategy article
- Your link building guide
- Your keyword research tutorial
Each is retrieved because it comprehensively answers a specific sub-intent related to the original query. Your “how to improve SEO” page might not even be cited if it’s too broad or if other pages better address the specific sub-questions.
This is why I see sites with mediocre rankings for primary queries getting cited frequently in AI answers. They’re not winning on the primary query, they’re winning on the sub-queries generated through fan-out.
The Important Limitations
Before you get too excited about this, there are some important limitations to understand:
You can’t see the sub-queries. Major AI systems don’t publish what sub-queries they generate. It’s internal to their retrieval pipeline. You can approximate it using LLMs yourself (ask ChatGPT to “decompose this query into sub-questions”), but you can’t know exactly what any particular system is doing.
It only affects retrieval-enabled systems. Static LLMs without live web retrieval (like base ChatGPT without plugins) don’t do this in the same way. They work from their training data, not from live retrieval.
Fan-out patterns vary by system. Google AI Overviews, Perplexity, Bing Chat, and other systems likely use different decomposition strategies. There’s no universal pattern to optimise for.
You can’t game it. Because you can’t see the sub-queries and because they’re dynamically generated based on the original query, you can’t reverse-engineer a formula for appearing in fan-out results.
What You Can Do About It
So if you can’t see the sub-queries and can’t game the system, what’s the point of understanding fan-out?
The point is it changes how you think about content comprehensiveness. Rather than trying to rank one page for one query, you need to think about covering an intent space thoroughly. This means:
- Creating content that addresses distinct sub-questions within a topic
- Ensuring each piece can stand alone and answer a specific question clearly
- Structuring content so individual sections can be retrieved independently
- Building topical authority across a subject area, not just for individual keywords
We’ll get into the practical implementation of this in the content strategy section. For now, just understand that fan-out is why “comprehensive topical coverage” has become the watchword of modern SEO. It’s not just marketing speak, it’s a response to a fundamental change in how retrieval systems work.
Bottom line on fan-out: It rewards intent coverage across a topic area. You need to be authoritative on a subject, not just rank for a keyword.
Link Building for LLM Citations
Right, let’s talk about links. Because despite all the changes in search, link building remains one of the most important (and most misunderstood) aspects of digital strategy.
The Core Truth About Links and LLMs
First, the uncomfortable reality: backlinks do not directly signal LLMs.
LLMs don’t read PageRank. They don’t interpret anchor text. They don’t parse link graphs. When an LLM decides whether to cite your content, it’s not looking at how many backlinks you have or what your Domain Authority is. (I mean, there’s an argument to be made that they’re trained on sources that historically considered those factors, but that’s pretty indirect.)
So do links matter for LLM citations? Yes, but indirectly.
Here’s how: most LLM systems that cite sources rely on retrieval layers. Those retrieval layers are usually powered by search indices. And search indices definitely care about backlinks. Backlinks are one of the primary signals search engines use to determine authority, trustworthiness, and which pages deserve to rank.
Therefore, backlinks influence which pages are eligible to be retrieved and cited. They’re a prerequisite, not a direct ranking factor for LLM citations. They help you get into the retrieval pool. Once you’re in that pool, other factors determine whether you actually get cited.
This distinction matters for strategy. You can’t skip link building and expect AI systems to discover and cite your content. But pouring resources into manipulative link schemes won’t directly increase AI citations either.
Evidence-Based Link Hierarchy
So what types of links actually matter? Based on available evidence and understanding of how retrieval systems work, here’s the hierarchy:
Editorial / Naturally Earned Links (Highest Value)
These are links placed voluntarily by third parties because your content is genuinely valuable or authoritative.
Why they matter:
- Strongest quality signal in SEO research
- Consistently correlate with higher organic rankings
- Increase retrieval likelihood in AI search systems
- Frequently come with contextual brand mentions
- Support entity recognition
For LLM citability:
- Strong indirect boost through authority and retrieval
- High trust signal
- Often appear on the same types of authoritative sites AI systems prefer
Practical approach: Create genuinely useful content, conduct original research, develop unique tools or resources, and let the links come to you. I know that sounds idealistic, but it’s genuinely the most effective long-term strategy.
Digital PR / Media Coverage (Very High Value)
Links and mentions from news outlets, journalists, and media publications.
Why they matter:
- High-authority endorsements from trusted sources
- These domains are overrepresented in AI citations
- Brand mentions matter even when links are nofollow or absent
- Reinforce real-world credibility
For LLM citability:
- Strong citation correlation (news sites are frequently cited)
- Brand/entity reinforcement in AI’s training data and retrieval systems
- Trusted source bias (AI systems prefer citing recognised publications)
Practical approach: Invest in proper digital PR. Respond to journalist requests, pitch newsworthy angles, provide expert commentary. This is one area where spending money typically provides good ROI, assuming you work with competent PR professionals.
Guest Posts (Moderate, Conditional Value)
Contributing articles to other sites in your industry.
Why they matter (conditionally):
- Can provide value when published on relevant, authoritative sites
- Help with topical association
- But: effectiveness depends heavily on editorial standards and quality
Why they often don’t:
- Search engines discount large-scale or low-quality guest posting
- Weaker than earned editorial links
- Generally seen as lower trust signal
For LLM citability:
- Helpful for topical association if done well
- Limited authority signal unless truly high-quality
- Secondary to editorial and PR links
Practical approach: Guest post sparingly and only on genuinely authoritative, relevant sites. Focus on sites you’d be proud to be associated with. If you’re approaching it as a “link building tactic” rather than a genuine opportunity to reach a new audience, you’re probably doing it wrong.
Niche Edits / Link Inserts (Variable Value)
Adding links into existing content on third-party sites.
Why they can help:
- Can work when placed in authoritative, topically aligned content
- Contextual relevance is key
Why they often don’t:
- Quality varies wildly
- Poorly executed inserts are algorithmically devalued
- From an AI perspective, only the surrounding content and source authority matter
For LLM citability:
- Can help retrieval if contextually strong
- Risky if low-quality
- Inferior to natural editorial links
Practical approach: Use sparingly and only with high-quality placements. The link needs to make sense in context and add value to the existing content. If it looks like an awkward insertion (which it usually does), it won’t help much.
Directories, Resource Pages, Listings (Low-Moderate Value)
Business directories, resource page links, industry listings.
Why they matter (slightly):
- Help with discovery and indexing
- Useful in local or niche-specific contexts
For LLM citability:
- Minor indirect benefit
- Rarely cited sources themselves
- More about ensuring you’re findable
Practical approach: Get the basics done (Google Business Profile, relevant industry directories), but don’t spend significant time or money here.
The Brand Mention Shift
Here’s something interesting that’s emerged from AI citation studies: brand mentions correlate more strongly with LLM citations than raw backlink counts.
This makes sense when you understand how these systems work. LLMs rely heavily on entity recognition and co-occurrence patterns in their training data. If your brand is mentioned frequently in trusted contexts, even without links, you’re building the kind of authority signals these systems recognise.
This is a significant shift from traditional link-focused SEO. An unlinked mention in a Guardian article might be more valuable for AI citability than a dozen links from mid-tier blogs. The mention itself builds entity authority and associative relevance.
Practically, this means:
- Focus on getting mentioned, not just linked
- Brand awareness campaigns have SEO value now (shocking, I know)
- Quality and context of mentions matters more than quantity
- Social media and PR mentions contribute to your overall authority profile
What Doesn’t Work (And Isn’t Supported by Evidence)
Let me save you some time and money by listing what we have no evidence for:
- No proof that any backlink type is directly read or preferred by LLMs
- No evidence that anchor text optimisation affects AI citations
- No support for link volume outperforming relevance + authority
- No indication that manipulative link tactics help AI visibility
- No special “AI-focused” link building strategies that work differently
If someone tries to sell you on any of these, they’re either speculating or lying. Possibly both.
Practical Link Building Strategy for 2025
Given everything we know (and don’t know), here’s what a sensible link building approach looks like:
- Prioritise editorial links and PR mentions over scalable tactics
- Focus on being referenced by trusted publications, not just linked
- Use guest posts sparingly and only on authoritative, relevant sites
- Build entity authority through consistent brand presence
- Remember: links are infrastructure for retrieval, not a direct citation lever
This might sound similar to traditional white-hat SEO advice. That’s because it is. The fundamentals haven’t changed as much as people think, we’re just more conscious now of why they matter.
Bottom line on links: Editorial links and brand mentions from authoritative sources. Everything else is secondary. No shortcuts, no tricks, just build genuine authority.
Right, let’s talk about technical SEO, because this is where a lot of misconceptions live.
Technical SEO for LLM Citations
I’m going to start this section with the conclusion, because it might save you reading the rest: There is no separate “technical SEO for LLMs.” There is only good technical SEO, applied in a world where retrieval and citation matter more than blue links.
If that disappoints you, I’m sorry. But it’s important to understand that LLMs aren’t some completely separate ecosystem requiring entirely new technical approaches. They rely on the same retrieval infrastructure that search engines use, which means the same technical fundamentals apply.
The Fundamental Principle
Let me repeat this because it’s crucial: LLMs don’t crawl the web themselves.
They rely on retrieval layers, search indices, RAG systems, curated corpora. Technical SEO determines whether your content makes it into those retrieval layers. If a page isn’t crawlable, isn’t indexed, or can’t be properly parsed, it cannot be cited. Full stop.
This means technical SEO isn’t optional. It’s not something you can skip while focusing on “AI-specific optimisations.” It’s the foundation that makes everything else possible.
Crawlability & Indexability (Non-Negotiable)
Your pages must be crawlable by search engines. This is table stakes.
What matters:
- No
robots.txtblocks on important content - No accidental
noindextags - Clean internal linking structure
- Proper XML sitemaps
- No orphaned pages
Why it matters for LLM citations: If search engines can’t crawl and index your content, retrieval systems can’t access it. No indexing = no retrieval = no citations. The chain breaks at the first step.
What to do: Run regular technical audits, fix crawl errors, ensure all important content is discoverable through internal links, monitor Google Search Console for indexing issues. The same things you should have been doing for traditional SEO.
Site Speed & Performance
Fast-loading pages are crawled more efficiently and more frequently.
What matters:
- Core Web Vitals (especially LCP, FID, CLS)
- Server response times
- Efficient rendering
- Minimal JavaScript blocking
Why it matters for LLM citations:
- Improves crawl efficiency (more content discovered)
- Reduces retrieval friction
- Slow or unstable pages risk partial rendering or skipped indexing
What to do: Optimise images, implement proper caching, minimise render-blocking resources, use a CDN. Again, standard performance optimisation. Nothing AI-specific here.
Content Parsability & Machine Readability
This is where things get slightly different, not because LLMs need something special, but because passage-level retrieval makes structure more important.
What matters:
- Clean, semantic HTML
- Logical heading hierarchy (H1, H2, H3 used properly)
- Lists, tables, and structured content where appropriate
- Minimal heavy JavaScript rendering
- Clear content segmentation
Why it matters for LLM citations:
- Retrieval systems often work at the passage or chunk level
- Well-structured content is easier to extract accurately
- Clear segmentation allows different sections to be retrieved independently
- Excessive JavaScript can prevent proper content extraction
What to do: Use semantic HTML5 elements, implement proper heading hierarchy, break content into clear sections with descriptive headings, ensure your content is readable even with JavaScript disabled. Think about how a chunk of your content would read in isolation, particularly the opening paragraph below your H1, which does disproportionate work for both readers and retrieval systems.
Structured Data & Metadata (Supportive, Not Required)
We covered schema earlier, but it’s worth repeating in the technical context.
What matters:
- Schema for entity clarity (Organization, Person)
- Article schema for content pieces
- Clear, descriptive meta titles and descriptions
- Author attribution
- Publication dates
Why it matters for LLM citations:
- Helps retrieval systems understand content semantics
- Clear metadata improves retrieval confidence
- Supports entity recognition
- But: these are supporting signals, not deterministic factors
What to do: Implement relevant schema types, write clear meta descriptions, ensure proper authorship attribution. Don’t go overboard trying to implement every possible schema type.
Content Accessibility & Format
HTML is king. Other formats are harder to retrieve and extract from.
What matters:
- Text-based HTML content
- Transcripts for video/audio content
- Alt text for images (for context, not just accessibility)
- Clear language and direct answers
- Content not hidden behind logins or paywalls
Why it matters for LLM citations:
- Text is easiest to retrieve and extract
- Clear, direct language improves passage selection
- Accessible content = retrievable content
What to do: Favour HTML pages over PDFs when possible, add transcripts to video content, ensure key information isn’t locked behind interactions or authentication. Make your content as accessible and extractable as possible.
What’s Different from Traditional SEO (Sort Of)
There are a few shifts in emphasis worth noting:
Ranking does not equal citation. A page doesn’t need to rank #1 to be cited. It needs to be retrievable and useful for a specific sub-query. This changes your priority calculus slightly, it’s less about dominating one primary keyword and more about thorough topical coverage.
Passage-level relevance matters more. Because retrieval systems often work at the chunk level, how you structure individual sections matters. Each section should be clear and self-contained enough to be understood independently.
Trust, clarity, and attribution are weighted heavily. AI systems are risk-averse. They favour sources that appear authoritative, well-maintained, and clearly attributed. Ambiguity hurts you more in an AI citation context than it does in traditional search.
But these aren’t really new technical requirements. They’re traditional SEO best practices with slightly different emphasis.
What’s NOT Supported by Evidence
Let me be really clear about what you don’t need to do:
- No special “LLM-only” technical optimisations exist
- No AI-specific crawl directives override search indexing
- No separate site architectures needed for LLMs
- No hidden technical signals unique to AI citation systems
- No magic tricks or shortcuts
If someone tries to sell you “technical AI SEO” that involves anything radically different from good traditional technical SEO, be very sceptical.
Practical Technical SEO Checklist
Here’s what actually matters:
Foundational (Do these first):
- All important pages are crawlable and indexed
- Site passes Core Web Vitals
- Clean, semantic HTML structure
- Proper heading hierarchy
- Mobile-friendly and responsive
- HTTPS throughout
- XML sitemap submitted
Content Structure (Focus here):
- Clear section breaks with descriptive headings
- Important information not buried in JavaScript
- Lists and tables for structured information
- Each section can be understood independently
- Direct answers to questions
- Clear, concise language
Metadata & Attribution (Polish):
- Relevant schema implemented
- Clear author attribution
- Publication/update dates visible
- Descriptive meta information
- Entity information complete
That’s it. That’s the technical checklist. It’s not exciting. It won’t make a good “10 Technical Tricks for AI Search” listicle. But it’s what the evidence actually supports.
Bottom line on technical SEO: The fundamentals matter more than ever. Make your content crawlable, fast, clearly structured, and easily extractable. Everything else is speculation.
Right, now let’s get to what might be the most important (and most misunderstood) part: content strategy.
Content Strategy: Comprehensiveness vs Cannibalisation
This is where everything comes together, and where I see the most confusion. People understand that “comprehensive content” matters, but they struggle with what that actually means in practice. More importantly, they’re worried about keyword cannibalisation: if I create lots of comprehensive content on related topics, won’t my pages compete with each other?
It’s a fair concern. And the answer is: it depends on whether you understand the difference between traditional cannibalisation and the kind of comprehensiveness that fan-out rewards.
The Core Shift in Content Strategy
Traditional SEO was about ranking pages for keywords. You’d identify a keyword, create a page for it, optimise that page, build links to it, and measure success by where it ranked for that specific term.
LLM-driven systems care about answering questions. They retrieve content for multiple sub-queries, synthesise information from multiple sources, and cite whatever best answers each component of the query.
Success in this environment isn’t just about ranking, it’s about being retrievable, usable, and citable across an intent space.
This is what people mean when they talk about Generative Engine Optimisation (GEO) or Answer Engine Optimisation (AEO). It’s not fundamentally different from good SEO, it’s good SEO with a different mental model. You’re optimising for comprehensive coverage of a topic, not just rankings for individual keywords.
How Query Fan-Out Changes Content Design
Remember query fan-out? It’s why content strategy has to evolve.
When a single user query expands into multiple sub-queries, your content gets evaluated against all of those sub-queries, not just the primary one. A system might decompose “How do I improve my site’s SEO?” into:
- What are the main ranking factors?
- How do I fix technical SEO issues?
- What is content optimisation?
- How important are backlinks?
- How long does SEO take?
- What tools do I need?
If your content only addresses the broad question without covering the sub-intents, you’re less likely to be retrieved for any of the component parts. But if you have distinct, clear answers to each sub-question (whether on one page or multiple pages), you have multiple opportunities to be retrieved and cited.
This is why “comprehensive content” has become such a focus. You’re not trying to rank for more keywords, you’re trying to cover the intent space thoroughly enough that you match various sub-queries generated through fan-out.
What “Comprehensive” Actually Means (This is Important)
Here’s where people get confused. Comprehensive content does not mean:
- Repeating the same information across multiple pages
- Creating slight variations of the same article
- Targeting every possible keyword variant with a separate page
- Making everything as long as possible
- Covering unrelated topics just to have “more content”
Comprehensive content does mean:
- Covering distinct sub-questions within a topic area
- Addressing related intents that a user might have
- Providing clear, self-contained answers that can be extracted independently
- Building topical authority across a subject
- Creating content at the right level of specificity for each intent
The difference is crucial. The first approach creates genuine cannibalisation and confuses both users and retrieval systems. The second approach builds authority and provides multiple entry points for fan-out queries.
Passage-Level Thinking
This is one of the practical shifts you need to make: think about your content at the passage level, not just the page level.
Many retrieval systems (including the ones powering AI search) work with chunks of content rather than full pages. They might extract a few paragraphs, or a single section, or even just a well-structured list. This means:
- Each section of your content should make sense independently
- Important information shouldn’t require reading the entire page to understand
- Clear headings that describe what each section covers
- Lists, tables, and structured elements that can be extracted cleanly
- Direct answers to questions where appropriate
A well-structured page can satisfy multiple sub-queries because different sections can be retrieved for different purposes. A poorly structured page, even with great information, might never get retrieved at all because the extraction systems can’t reliably pull out clean, quotable passages.
Practical structure tips:
- Use descriptive H2/H3 headings that could stand alone
- Put key information early in each section
- Use lists and tables for structured data
- Include clear definitions and direct answers
- Don’t bury important information deep in paragraphs
Traditional Cannibalisation vs Fan-Out Comprehensiveness
Right, let’s clear up the cannibalisation confusion once and for all.
Traditional SEO Cannibalisation:
- Happens when multiple pages target the same primary intent
- Causes ranking dilution (search engines don’t know which page to rank)
- Creates SERP confusion (different pages appear at different times)
- Is fundamentally a page-level ranking issue
- Problem: multiple pages competing for the same query
Fan-Out Comprehensiveness:
- Encourages coverage across different sub-intents
- Rewards specialisation and depth on distinct topics
- Each page serves a clear, separate purpose
- Is a retrieval and synthesis challenge, not a ranking one
- Solution: multiple pages addressing different aspects of a topic
The key distinction: Cannibalisation means competing for the same intent. Fan-out optimisation means covering different intents within a topic area.
You can have ten pages about SEO without cannibalisation if each addresses a distinct sub-topic:
- What is SEO? (foundational definition)
- Technical SEO guide (crawlability, site speed, structure)
- Content optimisation (on-page factors, keyword research)
- Link building strategies (backlinks, authority)
- Local SEO (Google Business, local citations)
- E-commerce SEO (product optimisation, structure)
- SEO tools comparison (software and platforms)
- SEO for specific industries (context-specific advice)
- SEO mistakes to avoid (common issues)
- How long does SEO take? (timeline expectations)
Each of these has a distinct primary intent. They might share some keywords naturally, but they’re not competing with each other, they’re supporting each other by building topical authority.
Where Comprehensiveness Can Go Wrong
That said, there is risk if you implement this poorly. You create actual cannibalisation problems when you:
- Create multiple pages answering the same core question with slight variations
- Rewrite the same content with different phrasing for “coverage”
- Don’t establish clear hierarchy and relationships between pages
- Create content without a defined, distinct intent
- Target the same commercial keywords from multiple landing pages
This is still bad. It causes the same ranking issues it always did, and it now also confuses retrieval systems trying to figure out which of your similar pages to cite.
The Correct Model: Hub-and-Spoke
The structure that solves both traditional SEO concerns and fan-out optimisation is the hub-and-spoke model (also called topic clusters).
Hub Page:
- Targets the broad, primary intent
- Provides overview and context
- Acts as the canonical ranking page for core terms
- Links out to all supporting pages
- Comprehensive but not exhaustive on any sub-topic
Spoke Pages:
- Each targets a distinct, specific sub-intent
- Goes deep on one aspect of the topic
- Links back to the hub and to related spokes where relevant
- Avoids duplicating the hub’s overview content
- Can rank independently for specific queries
Benefits:
- Prevents keyword cannibalisation (clear hierarchy)
- Builds topical authority (comprehensive coverage)
- Aligns with query fan-out (multiple entry points)
- Works for traditional SEO (clear structure)
- Works for AI retrieval (distinct, extractable content)
Example structure:
Hub: "Complete Guide to Technical SEO" (broad, authoritative, overview)
├── Spoke: "Website Crawlability: A Technical Guide"
├── Spoke: "Core Web Vitals Optimization"
├── Spoke: "XML Sitemaps: Implementation and Best Practices"
├── Spoke: "Structured Data and Schema Markup"
├── Spoke: "JavaScript SEO: Rendering and Indexing"
└── Spoke: "Site Speed Optimization Techniques"
Each spoke can be retrieved independently for specific sub-queries. The hub provides context and builds authority for the topic as a whole. There’s no duplication, no competition, just comprehensive coverage.
Intent Segmentation > Keyword Segmentation
This is the mindset shift you need to make: stop thinking about keywords, start thinking about intents.
Two pages can share keywords safely if their primary intent differs. The question isn’t “Do these pages target the same keywords?” The question is “Do these pages answer the same question?”
If the answer is yes, you have cannibalisation. If the answer is no, you have comprehensive topical coverage. It’s that simple.
Example of no cannibalisation (different intents):
- “How to choose SEO tools” (comparative/decisional intent)
- “Best SEO tools for small businesses” (recommendation/transactional)
- “SEO tools comparison: Features and pricing” (detailed comparison)
These might all rank for variations of “SEO tools,” but they serve different purposes and different stages of the user journey.
Example of cannibalisation (same intent):
- “How to improve your SEO”
- “Ways to boost your search rankings”
- “SEO improvement strategies”
These are all answering the same broad question with slight rephrasing. They’ll compete with each other and dilute your authority.
Practical Content Planning Rules
Here’s how to actually implement this:
Do:
- Create one authoritative page per core intent
- Build separate pages/sections for distinct sub-questions
- Establish clear internal linking that shows page relationships
- Write content to be quotable and extractable at the passage level
- Map your content to user intents, not just keywords
- Use hub-and-spoke structure for major topics
- Ensure each page has a clear, distinct purpose
Don’t:
- Create multiple pages answering the same main question
- Rewrite near-duplicate content for “SEO coverage”
- Build pages without clearly defined, distinct intent
- Forget to establish hierarchy between related pages
- Target the same commercial keywords from multiple landing pages
- Create content just to create content
Metrics in the LLM Era
Traditional metrics still matter, don’t abandon them. But you should be thinking about additional indicators:
Traditional (still important):
- Organic traffic
- Rankings for target keywords
- Conversion rates
- Engagement metrics
LLM-era (increasingly relevant):
- AI citations (track when possible)
- Brand mentions in AI-generated answers
- Coverage across conversational queries
- Featured snippet and “People Also Ask” appearances
- Topical authority signals (ranking across topic cluster)
- Visibility without clicks (unfortunately hard to track)
The uncomfortable reality is that we don’t yet have great tools for measuring AI visibility consistently. But that’s the state of the industry right now, we’re all figuring this out together.
Bottom line on content strategy: Query fan-out rewards intent coverage, not content duplication. Cannibalisation only occurs when intent boundaries are unclear. Hub-and-spoke structure solves both traditional SEO concerns and AI retrieval challenges. Think in intents, not keywords.
What Actually Works: The Practical Summary
Right, we’ve covered a lot of ground. Let me bring it together into something actionable.
The Uncomfortable Truth
Most “AI optimisation” is just traditional SEO done well. There are no magic tricks. There are no shortcuts. There are no special technical configurations that suddenly make LLMs prefer your content.
What there is is a shift in emphasis, from ranking individual pages for individual keywords to building comprehensive topical authority that can be retrieved and cited across multiple sub-queries.
If that sounds less revolutionary than you were hoping for, well, welcome to SEO in 2025. The fundamentals haven’t changed as much as the headlines suggest. We’re just more conscious now of why they matter and how they work in retrieval-based systems.
The Real Priorities (In Order)
If you want to be cited by AI systems, here’s what actually matters:
1. Crawlability and Technical Health (Non-Negotiable)
Without this, nothing else matters. Your content must be crawlable, indexable, fast, and properly structured. This is table stakes.
2. Content Quality and Comprehensive Coverage (Most Important)
Cover your topic thoroughly. Address distinct sub-intents. Structure content for passage-level extraction. Build genuine topical authority. This is where most of your effort should go.
3. Authority and Trust Signals (Critical)
Editorial links, media mentions, brand presence, entity clarity. Build genuine authority through quality signals, not quantity of links.
4. Clear Structure and Hierarchy (Enabling Factor)
Hub-and-spoke content architecture. Logical internal linking. Clear intent segmentation. Make it easy for systems to understand what you’re about and how your content relates.
5. Structured Data and Schema (Supporting)
Implement for clarity and entity recognition. Don’t expect miracles, but do it properly for the right reasons.
6. Entity Building (Long-term Investment)
Consistent brand/author presence across platforms. Build the kind of entity authority that makes systems recognise you as a trusted source.
Notice what’s missing from this list: tricks, hacks, shortcuts, “AI-specific” technical configurations. It’s all fundamentals. Good SEO, applied thoughtfully.
What Not to Do
Save yourself time, money, and frustration:
- Don’t create duplicate or near-duplicate content for “AI coverage”
- Don’t buy into “LLM-specific” technical tricks without evidence
- Don’t neglect traditional SEO fundamentals
- Don’t expect quick wins or guaranteed results
- Don’t implement recommendations you don’t understand
- Don’t forget that ranking still matters
- Don’t chase the algorithm instead of building something worth citing
Realistic Expectations
I need to be honest with you: we don’t have a proven playbook for AI optimisation because the systems are evolving faster than we can study them. What works today might not work in six months. What we think matters might turn out to be irrelevant.
This is frustrating. I know. But it’s honest.
What we can say with reasonable confidence is that the fundamentals, authority, clarity, comprehensiveness, technical health, will continue to matter. These things have mattered for traditional search, they matter for current AI systems, and they’ll likely matter for whatever comes next.
So rather than chasing specific AI optimisation tactics that might or might not work, focus on building a strong foundation: authoritative content, clear structure, genuine expertise, proper technical implementation. These things have lasting value regardless of how AI search evolves.
Conclusion
Look, I get the frustration. You came here hoping for a clear playbook, and what you’ve got instead is “do good SEO and understand query fan-out.” That’s probably disappointing.
But here’s the thing: anyone claiming to have a definitive, proven method for “ranking in LLMs” is either speculating based on limited data or actively misleading you. We’re all figuring this out as we go. The honest answer is that we don’t know exactly what works yet, because the systems are too new and too opaque for definitive answers.
What we do know is:
- Traditional SEO fundamentals still matter enormously
- Query fan-out is changing how content gets retrieved and cited
- Comprehensive topical coverage beats keyword-focused optimisation
- Authority signals (links, mentions, entity presence) remain crucial
- Technical health is non-negotiable
- Schema helps, but not for the reasons most people think
The shift isn’t toward some completely new discipline. It’s toward a slightly different mental model: thinking about comprehensive intent coverage rather than keyword rankings, understanding passage-level retrieval rather than page-level optimisation, building topical authority rather than chasing individual terms. If you want a structured approach to applying these principles, my generative engine optimisation service puts this methodology into practice.
That hub-and-spoke content model we discussed? That’s probably the most practical takeaway from this entire article. Structure your content around clear intent hierarchies, avoid duplication, cover sub-topics thoroughly, and make everything easily extractable. Do that well, and you’ll be in a better position than most sites regardless of how AI search evolves.
Is it exciting? No. Is it revolutionary? Not really. Does it work? Based on available evidence and logical inference, yes.
I know this article has been long (sorry, again). But I’d rather give you an honest, nuanced explanation than a confident-sounding listicle that oversimplifies a genuinely complex topic. The uncomfortable truth is that we’re all learning as we go, and anyone who tells you otherwise is selling something.
Focus on the fundamentals. Build genuine authority. Cover topics comprehensively. Make your content easily retrievable. And accept that we’re in a period of uncertainty where the playbook is still being written.
If you’ve got questions, if you disagree with something I’ve said, or if you’ve got data that contradicts what I’ve written here, please, reach out. We’re all figuring this out together, and honest discourse is more valuable than confident assertions.