What is GPTBot?
GPTBot is OpenAI's training crawler. OpenAI's bot documentation is plain about the job: “GPTBot is used to make our generative AI foundation models more useful and safe.” It collects web content that may be used to train future GPT models. It is 1 of 3 OpenAI bots you will meet in your server logs, and the only one whose job is training. OAI-SearchBot decides whether you show up in ChatGPT search, and ChatGPT-User fetches a page when a person asks. Blocking GPTBot does not remove you from ChatGPT search. Confusing the 3 is how sites vanish from answers by accident.
What GPTBot actually does
GPTBot crawls the public web and gathers text that OpenAI may feed into the training of its foundation models. The documentation says it directly: it is used to crawl content that may be used in training OpenAI's generative AI foundation models. That is a long-horizon job. Nothing GPTBot fetches today shows up in an answer tomorrow. It shapes what a future model knows, not what the current one cites.
This is the crawler most robots.txt "block the AI" recipes target, and blocking it is a legitimate choice. OpenAI honors it: “Disallowing GPTBot indicates a site's content should not be used in training generative AI foundation models.” The problem we see is not the block itself. It is the copy-pasted rule that blocks GPTBot's siblings along with it, because the three bots do very different work.
GPTBot vs OAI-SearchBot vs ChatGPT-User
OpenAI documents 3 distinct bots, each with its own user-agent token and its own consequences:
- GPTBot is the training crawler. It collects content that may train future models. Blocking it opts your content out of training. It has nothing to do with search citations.
- OAI-SearchBot is the search crawler. Per the docs, it is used to surface websites in search results in ChatGPT's search features. This is the one that earns you a citation. We cover it in full in What is OAI-SearchBot?
- ChatGPT-User fetches a page when a person's action in ChatGPT or a Custom GPT calls for it. The docs note that “Because these actions are initiated by a user, robots.txt rules may not apply.” It is also, per the docs, not used to determine whether content may appear in search.
Each bot announces itself by name, so your server logs and your robots.txt can treat them separately. That separation is the entire point. OpenAI built three bots precisely so you could say no to training and yes to citations, or the reverse.
What blocking each one costs you
Blocking GPTBot costs you presence in future model training. Some owners want exactly that, and it is a defensible position. Your words stop feeding the next GPT, and you give up the chance that the model natively knows your brand.
Blocking OAI-SearchBot costs you visibility now. OpenAI states that sites opted out of OAI-SearchBot will not be shown in ChatGPT search answers, though they can still appear as navigational links. If ChatGPT users are worth anything to you, this is the block that hurts.
Blocking ChatGPT-User attempts to stop user-initiated fetches, but as quoted above, robots.txt rules may not apply to those, since a human asked for the page.
The expensive mistake is the blanket rule. A robots.txt that disallows every agent with "GPT" in the name, written to keep training bots out, also throws away ChatGPT citations. We audit sites weekly that made this trade without knowing it. Our walkthrough on GPTBot and robots.txt: allow or block the AI crawlers works through the decision line by line, and the repair steps live in how to fix robots.txt blocking AI crawlers.
How to control GPTBot in robots.txt
The token is GPTBot. To opt out of training while staying eligible for ChatGPT search citations, name each bot and give each its own rule:
# No training, yes citations User-agent: GPTBot Disallow: / User-agent: OAI-SearchBot Allow: /
To allow everything, either say so explicitly or simply leave both unblocked. Whatever you choose, choose it per bot. Then confirm the file says what you think it says. A stray wildcard rule above these lines can override your intent, and the only way to know is to test the file against each user agent.
See also
GPTBot is OpenAI's entry in a pattern every AI company now follows: one bot for training, one for search, one for user fetches. Anthropic splits the same way, covered in What is Claude-SearchBot?. Google handles training opt-out differently, with a token instead of a crawler, covered in What is Google-Extended?.