Skip to main content

Slashdot: Are AI Web Crawlers 'Destroying Websites' In Their Hunt for Training Data?

Are AI Web Crawlers 'Destroying Websites' In Their Hunt for Training Data?
Published on August 31, 2025 at 11:57PM
"AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model mills," argues Steven J. Vaughan-Nichols at the Register. And "when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance..." How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. Leading the way and growing fast? AI bots... Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers. Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes. Moreover, AI crawlers are much more aggressive than standard crawlers. As the InMotionhosting web hosting company notes, they also tend to disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts. The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website... AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world. Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls. As for robots.txt files, the old-school way of blocking crawlers? Many — most? — AI crawlers simply ignore them... There are efforts afoot to supplement robots.txt with llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing. In the meantime, to combat excessive crawling, some infrastructure providers, such as Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data.

Read more of this story at Slashdot.

Comments

Popular posts from this blog

Slashdot: Spain-Backed Fund Joins FOSSA's Sovereign Satellite Communications Push

Spain-Backed Fund Joins FOSSA's Sovereign Satellite Communications Push Published on 2026-06-28T22:05:00Z Spanish startup FOSSA Systems "has raised about $10.5 million to expand its connectivity constellation," reports Space News, noting some funding is backed by Spain's government: The support from the Spanish Society for Technological Transformation (SETT) comes a year after the fund injected 14 million euros into Spain's Sateliot , which is also developing a satellite connectivity network with security and defense applications. Spanish private investment firm Kibo Ventures led FOSSA's funding round, the six-year-old venture announced June 24, bringing its total raised to date to nearly 20 million euros. The proceeds will help fuel FOSSA's push beyond the tiny picosatellites it once used to connect low-power monitoring devices toward larger cubesats in low Earth orbit, enabling additional sovereign communications and space-based intelligence capab...

Slashdot: AT&T Outlines $250 Billion US Investment Plan To Boost Infrastructure In AI Age

AT&T Outlines $250 Billion US Investment Plan To Boost Infrastructure In AI Age Published on 2026-03-10T20:00:00Z AT&T plans to invest more than $250 billion over the next five years to expand U.S. telecom infrastructure for the AI age. The company says it will also hire thousands of technicians while partnering with AST SpaceMobile to extend coverage to remote areas. Reuters reports: Rapid adoption of artificial intelligence, cloud computing and connected devices has prompted telecom operators to invest heavily in fiber and 5G networks as they also seek to fend off intensifying competition from cable broadband providers. AT&T, which has about 110,000 employees in the U.S., said the new hires will help build and maintain its infrastructure. The outlay includes capital expenditure and other spending, the company said. The spending will focus on expanding its fiber and wireless networks, including accelerating deployment of fiber broadband, 5G home internet and satellite co...