Fix library

How to fix robots.txt blocking AI crawlers

If your pages never get cited in AI answers, the first thing to check is your robots.txt. There are 2 kinds of AI bots, and most owners block both by accident: the retrieval bots that fetch pages to cite, and the training bots that collect text for models. Per OpenAI's bot documentation these are separate user-agents you can control independently. The rule is plain: “if you block the crawler, you block the citation.” Allow the search bots, decide on training separately, and verify.

The symptom: you are nowhere in the answers

You write good pages. A person can find them in Google. But when you ask ChatGPT, Perplexity, or Google's AI Overviews about your topic, your site is never quoted. A competitor with a thinner page gets named instead. You check your content, your schema, your speed, and none of it explains why you are invisible.

Before you rewrite anything, look at one file. Open yoursite.com/robots.txt in a browser and read it. In a large share of the sites we audit, the reason a page is never cited is not the page at all. It is a single line in robots.txt telling the AI search crawlers to stay out. The page is fine. The door is locked.

The cause: blocking "AI" blocks the wrong thing

Two patterns cause almost all of these failures, and both come from good intentions.

The first is the blanket disallow. A line like User-agent: * Disallow: / tells every well-behaved bot to skip the whole site. People leave it in place from a staging setup, or a plugin writes it, and it quietly excludes the citation crawlers along with everyone else.

The second is the deliberate AI block. An owner reads that AI companies scrape sites to train models, decides to "keep the AI out," and blocks every AI user-agent they can find. The problem is that the same companies run two different bots for two different jobs, and blocking all of them throws away the citations to stop the training.

The key fact is that these are independent decisions. You can welcome the bots that cite you and refuse the bots that train on you, all in the same file. Blocking training is a real choice some owners want to make. Blocking retrieval is almost never what anyone means to do, and it is the line that removes you from the answers.

The fix: allow the search bots, choose on training

Open your robots.txt and make the split explicit. Allow the retrieval and search crawlers so they can read and quote you. Allow Googlebot so you stay eligible for AI Overviews. Then, only if you specifically want to stay out of model training, block the training bots. Here is a copy-paste starting point:

# Let the AI search crawlers fetch and cite you
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

# Keep normal Google access (AI Overviews ride the index)
User-agent: Googlebot
Allow: /

# Optional: stay out of model training only
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Make sure no blanket rule locks everyone out
User-agent: *
Allow: /

The non-negotiable part is allowing the search bots. If you want to be in the answers, those four Allow: / lines are what put you there. The three training blocks at the bottom are entirely optional. Delete them if you are happy to be in training data, keep them if you are not. Either way, your citation eligibility is unchanged, because retrieval and training are separate user-agents.

One caution while you edit. The last block above sets User-agent: * Allow: / so a leftover blanket disallow cannot quietly override your work. If a specific bot needs a rule, give it its own block rather than tightening the wildcard, because a single Disallow: / under User-agent: * is the exact mistake that started this.

Verify: read it the way a bot does

A change you cannot confirm is not a fix. Once you have saved the file, check it the same way a crawler would:

One thing to keep in perspective: robots.txt is a public, voluntary standard, not access control. It tells well-behaved bots what you would prefer, and the major search and AI crawlers honor it, which is exactly why a wrong line here is so costly. It is not a security boundary, so never use it to hide private pages. Use it to invite the bots that cite you and to make a clear, deliberate choice about the bots that train on you.

Check your own file

You can read all of this by hand, or you can paste your link into Brimm and see in about 30 seconds which AI crawlers your robots.txt lets in and which it locks out, named bot by named bot. We read the live file the way the engines do and print the failures in fix order. Start from the fix library for related issues, learn the bigger picture in what AEO actually is, or run the focused GEO audit when you want the crawler-access view on its own. When you are ready for the full read, open the audit.

See which bots your robots.txt lets in.

Paste your link. We check every AI search and training crawler against your live robots.txt and print the verdict in fix order. The preview is free.