15/06/2026

How to Get Cited by ChatGPT, Perplexity and Google AI Overviews

Three engines now sit between your prospect and your website. Here's exactly how each picks sources, and the content patterns that earn citations.

Three engines now sit between your prospect and your website: ChatGPT, Perplexity and Google's AI Overviews / AI Mode. They all do roughly the same thing — retrieve sources, then synthesise an answer with citations — but each picks sources differently. This post is the operating manual: how each engine selects what it quotes, what content patterns get cited, and the technical work most sites have not yet done.

How each engine actually picks sources

The retrieval mechanics matter because they determine which optimisation work pays off where.

ChatGPT (and ChatGPT Search)

ChatGPT's web-aware modes use a search backend — historically Bing's index, with OpenAI's own crawlers (OAI-SearchBot and GPTBot) feeding supplementary data. When a user asks a question that triggers a web search, ChatGPT issues a query to that backend, retrieves the top results, and grounds its response in the returned passages. Citations are surfaced as small inline links and a sources panel.

Practical implication: if Bing cannot find or index your page, ChatGPT will not cite it. Bing Webmaster Tools is no longer optional. OAI-SearchBot should be allowed in robots.txt for your content to be eligible for inclusion.

Perplexity

Perplexity runs a hybrid: its own crawler (PerplexityBot) plus retrieval-augmented generation over multiple sources, with explicit citation tied to each claim in the response. Perplexity is unusually citation-heavy — most answers carry 4-8 numbered sources clickable from each sentence.

Practical implication: Perplexity rewards pages with clean, claim-level structure. A sentence on your page that directly answers a sub-question is more likely to be cited than a paragraph that buries the same claim in prose. Allow PerplexityBot in robots.txt.

Google AI Overviews and AI Mode

Both surfaces are powered by Gemini grounded in Google's own search index. They use the query fan-out technique — decomposing a question into sub-queries, retrieving for each, and synthesising. Sources cited are typically drawn from pages ranking in the top 10-20 for the underlying sub-queries.

Practical implication: classical SEO is the entry ticket. If you do not rank for the sub-queries, you cannot be cited. Allow Google-Extended in robots.txt to be eligible for Gemini training (a separate control from Googlebot crawling for search).

The crawler control panel

Before you optimise anything, get the access right. Add to your robots.txt:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

If you have a legal or commercial reason to block one of these, do so explicitly — but understand the trade. Blocking is equivalent to opting out of citation.

The content patterns that get cited

Across all three engines, the same patterns repeat. The Aggarwal et al. GEO paper (Princeton, 2024) tested nine optimisation strategies across 10,000 queries and found the top three were citation-density, quotation-inclusion, and statistic-inclusion — each producing 30-40% visibility lifts. Our own testing across UK clients confirms the same hierarchy.

1. Definition-first paragraphs

The opening sentence under every H2 should be a self-contained definition of the term in the heading. Generative engines retrieve at the passage level; if the first sentence answers the question, that sentence is the citation.

Bad:

Many businesses today are struggling with the challenges of modern search...

Good:

Generative Engine Optimisation (GEO) is the practice of structuring web content so that AI search engines like ChatGPT and Perplexity include and cite it in their synthesised responses.

2. Cited statistics and original data

A claim with a number and a source is dramatically more citable than a claim without. Two patterns work:

  • Quote a credible third party — "BrightEdge found AI Overviews appear on 64% of medical queries" — with a link.
  • Publish your own data — survey results, benchmark studies, anonymised client outcomes. Original data is the single highest-yielding asset for citation share; the engines have a strong preference for primary sources.

3. Comparison tables

When the user's question is comparative ("X vs Y", "best X for Y"), a clean HTML table with two-to-four competing options and three-to-six attributes is the most lifted content pattern in AI Mode. Tables are easy to extract, easy to attribute, and rare on most sites.

4. Named authors with verifiable credentials

Pages with a real, named author and a bio that links to a professional profile (LinkedIn, company about page, published work) are cited at materially higher rates. Generative engines weight authority signals heavily because hallucination cost is high; a verifiable human is a trust anchor.

5. FAQ blocks with question-formatted H3s

A FAQ section at the end of every article (six to ten questions, each H3, with a concise two-to-four sentence answer) is the highest-yielding pattern for fan-out and for ChatGPT-style passage retrieval. Mark up with FAQPage schema.

6. Schema markup

Use Article with author (linked to a Person entity), datePublished, dateModified, and publisher. Add FAQPage for the FAQ. Add Organization schema sitewide. Schema does not magically force citation, but it disambiguates entities for the retrieval models, which lowers the cost of citing you.

7. Direct quotations

Pages that quote experts (with attribution) are cited more often than pages that paraphrase. A short pull-quote with a credentialed source is high-leverage and underused.

Tactical checklist

Run this against any page you want to be citable:

  • Opening sentence is a self-contained definition.
  • Page contains at least three cited statistics with named sources.
  • Page contains at least one comparison table or structured list.
  • Named author with bio and external link to credentials.
  • FAQ block at the end with six-plus question-style H3s.
  • Article + FAQPage schema present.
  • H2/H3 hierarchy follows question/answer logic.
  • Internal links to two-to-five related articles on the same topic.
  • Page is in Bing's index (check via site: operator).
  • GPTBot, OAI-SearchBot, PerplexityBot, Google-Extended all allowed in robots.txt.
  • Page returns a 200 with the full HTML body (not blocked behind a JS-only render with no SSR).

If a page fails three or more of these, fix them before publishing anything new.

How to measure citation share

Build a simple weekly check:

  1. Pick 30-50 priority queries — the questions a buyer in your target segment actually asks.
  2. Every week, run each query through ChatGPT, Perplexity and Google AI Mode.
  3. For each response, log: which domains are cited, in what order, and whether yours appears.
  4. Track the trend over four-week rolling windows.

The metric is citation share: the percentage of your priority queries that cite your domain at least once. A reasonable baseline goal for a domain doing this seriously is 25-40% citation share on its priority query set within six months. Industry leaders we have measured sit at 60-70%.

Several tools now do this automatically — Profound, Otterly, AthenaHQ, SE Ranking's AI Tracker — but a Python script and a spreadsheet are enough to start.

What to stop doing

  • Publishing 2,000-word articles where the answer is in paragraph nine. Front-load.
  • Hiding authors behind a generic "Team" byline. Name your experts.
  • Leaving FAQs out because they look unfashionable. They are the highest-yielding section on the page.
  • Treating Bing as irrelevant. Bing is ChatGPT's index. Submit your sitemap to Bing Webmaster Tools today.
  • Blocking AI crawlers by default. The published cost is opacity in the systems your buyers now use.

The honest summary

Getting cited by ChatGPT, Perplexity and Google AI Overviews is not a separate discipline from SEO. It is SEO done with the assumption that your reader is a language model assembling an answer. Define terms cleanly, cite numbers credibly, structure content so a single passage can be lifted without context, and let the AI crawlers in. The same pages that get cited tend to convert better when humans click through, because the patterns that win citation — clear definitions, real data, named authors — are also the patterns that build trust.

FAQ

Do I need a separate page strategy for each AI engine? No. The same well-built page is citable across all three. Tailor at the section level — definitions, cited stats, tables, FAQs — not the page level.

Will blocking AI crawlers protect my content? It will exclude you from citation. It does not stop the content being re-summarised second-hand from sites that quote you. If you are worried about training, the relevant block is GPTBot and Google-Extended; OAI-SearchBot and PerplexityBot are retrieval-only and you almost certainly want them allowed.

How long until I see citations after publishing? ChatGPT and Perplexity often pick up new content within days once indexed. AI Overviews typically take two-to-six weeks because they depend on the underlying organic rank stabilising.

Is there a single highest-ROI change? Add named authors with real bios, rewrite each H2's opening sentence as a self-contained definition, and add an FAQ block to every page. That triple lifts citation rates across all three engines more than any other intervention we have tested.

Does citation drive revenue? Indirectly and increasingly directly. Brand impression at the moment of consideration compounds; users who see your domain cited in three AI answers across a buying journey are materially more likely to type your name into Google later. Track assisted conversions and branded search volume, not just direct AI referrals.


If you want this audited and shipped across your site, our SEO service treats citation share as a first-class metric alongside organic rank.

Hand-picked next steps from across our guides and services.