Glossary

What is GPTBot?

GPTBot is OpenAI's training crawler. OpenAI's bot documentation is plain about the job: “GPTBot is used to make our generative AI foundation models more useful and safe.” It collects web content that may be used to train future GPT models. It is 1 of 3 OpenAI bots you will meet in your server logs, and the only one whose job is training. OAI-SearchBot decides whether you show up in ChatGPT search, and ChatGPT-User fetches a page when a person asks. Blocking GPTBot does not remove you from ChatGPT search. Confusing the 3 is how sites vanish from answers by accident.

What GPTBot actually does

GPTBot crawls the public web and gathers text that OpenAI may feed into the training of its foundation models. The documentation says it directly: it is used to crawl content that may be used in training OpenAI's generative AI foundation models. That is a long-horizon job. Nothing GPTBot fetches today shows up in an answer tomorrow. It shapes what a future model knows, not what the current one cites.

This is the crawler most robots.txt "block the AI" recipes target, and blocking it is a legitimate choice. OpenAI honors it: “Disallowing GPTBot indicates a site's content should not be used in training generative AI foundation models.” The problem we see is not the block itself. It is the copy-pasted rule that blocks GPTBot's siblings along with it, because the three bots do very different work.

GPTBot vs OAI-SearchBot vs ChatGPT-User

OpenAI documents 3 distinct bots, each with its own user-agent token and its own consequences:

Each bot announces itself by name, so your server logs and your robots.txt can treat them separately. That separation is the entire point. OpenAI built three bots precisely so you could say no to training and yes to citations, or the reverse.

What blocking each one costs you

Blocking GPTBot costs you presence in future model training. Some owners want exactly that, and it is a defensible position. Your words stop feeding the next GPT, and you give up the chance that the model natively knows your brand.

Blocking OAI-SearchBot costs you visibility now. OpenAI states that sites opted out of OAI-SearchBot will not be shown in ChatGPT search answers, though they can still appear as navigational links. If ChatGPT users are worth anything to you, this is the block that hurts.

Blocking ChatGPT-User attempts to stop user-initiated fetches, but as quoted above, robots.txt rules may not apply to those, since a human asked for the page.

The expensive mistake is the blanket rule. A robots.txt that disallows every agent with "GPT" in the name, written to keep training bots out, also throws away ChatGPT citations. We audit sites weekly that made this trade without knowing it. Our walkthrough on GPTBot and robots.txt: allow or block the AI crawlers works through the decision line by line, and the repair steps live in how to fix robots.txt blocking AI crawlers.

How to control GPTBot in robots.txt

The token is GPTBot. To opt out of training while staying eligible for ChatGPT search citations, name each bot and give each its own rule:

# No training, yes citations
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

To allow everything, either say so explicitly or simply leave both unblocked. Whatever you choose, choose it per bot. Then confirm the file says what you think it says. A stray wildcard rule above these lines can override your intent, and the only way to know is to test the file against each user agent.

See also

GPTBot is OpenAI's entry in a pattern every AI company now follows: one bot for training, one for search, one for user fetches. Anthropic splits the same way, covered in What is Claude-SearchBot?. Google handles training opt-out differently, with a token instead of a crawler, covered in What is Google-Extended?.

Is your robots.txt blocking the wrong bot?

Paste your link. We check which AI crawlers can reach you, which are blocked, and what that costs you. The preview is free.