Cloudflare, a publicly traded provider of cloud services, has launched a new tool aimed at preventing AI-driven bots from scraping data on websites hosted on its platform to train AI models. The tool is offered free of charge.
While some AI vendors, such as Google, OpenAI, and Apple, allow website owners to block bots used for data scraping and model training by modifying their site’s robots.txt file, not all AI scrapers adhere to these rules, as Cloudflare highlights in a blog post introducing its anti-bot tool.
According to Cloudflare, customers are concerned about AI bots visiting their websites, especially those doing so in an unauthorized manner. The company aims to tackle this issue by leveraging automated bot detection models that analyze AI bot and crawler traffic. These models consider various factors, including behaviors that mimic human browsing patterns to evade detection.
“When malicious actors attempt large-scale website crawling, they typically use identifiable tools and frameworks,” Cloudflare explains. “Based on these indicators, our models can effectively flag traffic from evasive AI bots as bot activity.”
Cloudflare has implemented a reporting mechanism for hosts to report suspected AI bots and crawlers and plans to continuously update its blacklist of these entities over time.
The proliferation of generative AI technologies has intensified the demand for training data, prompting many websites to block AI scrapers and crawlers concerned about unauthorized model training. However, blocking measures are not foolproof, with some vendors allegedly circumventing standard bot exclusion protocols to gain competitive advantages.
Cloudflare’s tool represents a step towards addressing these challenges, although its effectiveness in accurately identifying covert AI bots remains to be seen. Moreover, it does not resolve the broader issue of publishers potentially forfeiting traffic from AI-driven tools like Google’s AI Overviews, which may exclude sites blocking specific AI crawlers.