What this article is about

When you ask ChatGPT a question, you get a one-paragraph answer with a few citations underneath. Those citations are doing the work. Without them, the AI either makes something up or refuses to answer.

This article explains where those citations actually come from. The mechanism surprised me when I learned it. It will probably surprise you too. By the end you will understand why two sites with similar Google rankings can have very different AI visibility.

How does ChatGPT decide which sites to cite?

It does not browse the web in real time the way you do. When a user asks a question, ChatGPT and other AI assistants run a process called query fanout.

Here is the simple version. The original question gets broken into 12 to 15 sub-queries. Complex questions can hit 50 sub-queries. Each sub-query becomes its own search. The system then collects sources from all of those searches and ranks them.

Two important consequences.

First, the source pool is much bigger than the source pool for a normal Google search. You are not competing with 10 results. You are competing across dozens of related searches.

Second, AI does not click anything. It picks sources based on snippets, meta descriptions, structured data, and the source's reputation. A page can be on Google page 5 and still get cited if its snippet matches a sub-query better than the page-1 results.

This is why traditional SEO score sometimes matters less than you would expect.

What sources do AI assistants actually cite?

The dominant pattern across ChatGPT, Claude, Perplexity, and Gemini is the same. Wikipedia and Reddit lead. After that, official documentation, news outlets, and a small set of community-driven sites.

PlatformTop citation sourcesCitation rate
PerplexityWikipedia, Reddit, GitHub, Stack Overflow13.05 percent of answers
GrokWikipedia, Reddit, X.com, news27.01 percent
Google AI ModeReddit, Wikipedia, Quora, LinkedIn9.09 percent
GeminiWikipedia, Reddit, official docs6.38 percent
ChatGPTWikipedia, official docs, news0.59 to 0.7 percent
ClaudeAlmost no public citationsNear 0

Notice that ChatGPT cites very little publicly. That does not mean it does not use sources. It uses them. It just does not show them in most answers. The implication for your business is the same. The page has to look citable. Whether the citation is shown to the user is a separate question.

The 5W research also found that the top 15 domains together collect 68 percent of all AI citations. That is a more concentrated power structure than Google's PageRank. If you are not on Wikipedia and not active on Reddit, you are starting from a tough position.

Why does the first third of your page matter so much?

Princeton University analyzed 21,143 AI citations and found that 44.2 percent of citations come from the first 30 percent of an article. The middle delivers 31.1 percent. The end delivers 24.7 percent.

This is not how humans read. Humans skim middles and skip to ends. AI does not. AI extracts what is closest to the heading or topic, which is usually the opening of the section.

Practical consequence. If your homepage opens with "Welcome to our website. We have been helping businesses since 2008", you are wasting your most valuable real estate. The opening should be a definition. Then a number. Then context.

Compare these two openings.

Welcome to AcmeWidget. We are passionate about delivering innovative solutions for forward-thinking businesses.
AcmeWidget is a CRM for small law firms. We have 12,000 users in 47 countries. The platform replaces seven separate tools with one billing-friendly subscription.

The first is invisible to AI. The second is highly citable.

What types of content get cited the most?

The Princeton study ranked content features by their influence on citation. The order was definitions, comparisons, evidence, statistics. All of these have one thing in common. They are extractable. AI can lift them out of context and use them to answer a related question without adding much work.

Three rules of thumb that follow from the data.

  1. Tables beat paragraphs for comparisons. Tables get cited 2.5 times more often than the same information in prose.
  2. Code blocks beat free-form examples. The Princeton study found code blocks deliver +76.88 percent absorption.
  3. Numbered lists beat bullet lists for how-to content. The numbers signal extractable steps.

I see this in audit data every day. The pages with the highest scores in our AI Visibility category have a definition in the first paragraph, a table in the middle, and at least one statistic per 200 words. Nothing fancy. Just consistent structure.

How do you make your site easier to cite?

Six things, in priority order.

First, write definitions. Open every important page with a one-sentence "X is Y" statement. Use specific words.

Second, cite statistics with sources. One stat per 150 to 200 words. Every stat gets a hyperlink to where it came from. AI rewards this with +41 percent visibility according to Princeton.

Third, use tables for comparisons. Three columns, header row, simple. Do not render comparisons as div grids. AI cannot easily extract them.

Fourth, build entity authority. Add Organization schema with sameAs links to LinkedIn, Crunchbase, your social profiles, and Wikipedia if you have a page. This is how AI knows you are a real entity and not a content farm.

Fifth, refresh dates. Update your top pages every 30 days. Add a visible "Last updated" date. Add dateModified to your Article schema. Stale content gets 3.2 times fewer citations.

Sixth, run an audit. The free audit at AIFreeAudit checks all 35 of these signals on your site in 30 seconds. It is the fastest way to know where you are losing AI citations today.

Summary

AI assistants do not browse like humans. They run query fanouts, pull from a small set of trusted sources, and lean heavily on the first third of any page. To get cited, your content needs definitions, statistics, comparisons, structured data, and recent timestamps. Most sites get one of these right. The pages that get cited get all six right.

If you want to know which signals your site is missing, run the free audit. It costs nothing and takes 30 seconds.