A recent analysis by data journalist Ben Welsh revealed that a significant number of news websites are actively blocking AI web crawlers such as Applebot-Extended, Google-Extended, and OpenAI. Among the 1,167 primarily English-language, US-based publications surveyed, Welsh found that 26 percent were blocking Applebot-Extended, while 53 percent were blocking OpenAI’s bot. Interestingly, Google-Extended, introduced last September, was blocked by nearly 43 percent of the surveyed sites. This data suggests that while Applebot-Extended may be flying under the radar for some, there is a growing awareness and implementation of bot blocking measures.
Welsh’s analysis also highlighted a divide among news publishers regarding their stance on blocking AI web crawlers. Some publishers are clearly in favor of allowing these bots access to their sites, while others choose to block them. The reasons behind these decisions are varied and not always clear. Licensing deals, where publishers are paid in exchange for allowing bot access, may play a role in influencing these decisions. Major players in the industry, such as The New York Times and Condé Nast, have reportedly explored AI partnerships with companies like Apple, OpenAI, and Perplexity, indicating a strategic approach to data sharing and content access.
One of the major challenges faced by publishers in implementing bot blocking measures is the dynamic nature of the AI landscape. With new AI agents constantly debuting and evolving, manually updating block lists can be a daunting task. Companies like Dark Visitors offer services that automate the process of updating robots.txt files to block unwanted bots, alleviating some of the burden from publishers. Gavin King, founder of Dark Visitors, emphasizes the importance of staying informed about AI scraping tools and their potential impact on copyright concerns for publishers.
In light of the increasing significance of robots.txt files in the digital publishing landscape, it has become apparent that decisions on which bots to block are no longer relegated to webmasters alone. Media executives, including CEOs of major media companies, are directly involved in determining bot access policies for their organizations. Some publishers explicitly state that they block AI scraping tools until a commercial agreement is reached with the owners. For example, Vox Media’s senior vice president of communications, Lauren Starke, revealed that Applebot-Extended is blocked across all Vox Media properties in the absence of a commercial agreement.
The strategic approaches taken by news publishers in blocking AI web crawlers reflect the evolving nature of digital content distribution and data privacy concerns. As AI technology continues to advance and permeate various industries, publishers must navigate the complex landscape of bot blocking to protect their content and intellectual property. Collaborative efforts between publishers, AI developers, and regulatory bodies may be necessary to establish clear guidelines and best practices for AI web crawling in the future.