Cloudflare split AI and crawler permissions into three separate categories on July 1: Search, AI Agent, and AI Training. Search covers traditional indexing. AI Agent covers real-time fetches a user’s own assistant makes on request. AI Training covers crawls used to train models. The split gives site owners a lever robots.txt never offered: permission to rank in search while refusing training access, or the reverse.

The bigger news is the default Cloudflare is changing. Starting September 15, 2026, AI training crawlers will be blocked by default across Cloudflare’s network unless a site owner opts in. According to Cloudflare, the new default also applies to Google’s own AI training crawling, not only to newer entrants such as OpenAI or Anthropic. A publisher that wants full Google Search visibility but does not want Google training on the same content will need to set that distinction explicitly once the new default applies.

Cloudflare sits in front of a large share of global web traffic. A default change at that scale reaches further than any single company’s opt-in toggle. Robots.txt has never distinguished between crawling to rank a page and crawling to train a model on it. Cloudflare’s three-way split forces that distinction at the infrastructure layer, ahead of any standard the search engines themselves have agreed to. Cloudflare has offered crawler-blocking controls before, but leaving them opt-in meant most sites never changed the setting. Flipping the default is what turns a rarely-touched toggle into a web-wide event that reaches sites whose owners take no action at all.

For SEO and GEO teams, the immediate question is not whether to block AI training. It is whether current settings already block it by accident, or will once the September default takes effect. A site that depends on AI Overviews citations or AI Mode citations needs its Search and AI Agent categories open even if it closes AI Training. Conflating the three risks losing citation traffic while believing only model training access was restricted.

Cloudflare’s announcement does not include independent measurement of how many training crawlers are active today on sites the new default would newly block. It also does not detail how the system verifies that a crawler claiming to make an AI Agent fetch is not actually running a training pass under that label. Those enforcement details decide whether the three-way split works as described or becomes another self-reported system search teams cannot audit.

Search teams running behind Cloudflare should audit their bot-management settings before September 15 and confirm which category each major AI crawler falls under, rather than assume the new defaults preserve today’s visibility.

Cloudflare published the crawler-permission change on its company blog on July 1, 2026.