Skip to main content

Slashdot: Are AI Web Crawlers 'Destroying Websites' In Their Hunt for Training Data?

Are AI Web Crawlers 'Destroying Websites' In Their Hunt for Training Data?
Published on August 31, 2025 at 11:57PM
"AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model mills," argues Steven J. Vaughan-Nichols at the Register. And "when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance..." How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. Leading the way and growing fast? AI bots... Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers. Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes. Moreover, AI crawlers are much more aggressive than standard crawlers. As the InMotionhosting web hosting company notes, they also tend to disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts. The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website... AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world. Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls. As for robots.txt files, the old-school way of blocking crawlers? Many — most? — AI crawlers simply ignore them... There are efforts afoot to supplement robots.txt with llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing. In the meantime, to combat excessive crawling, some infrastructure providers, such as Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data.

Read more of this story at Slashdot.

Comments

Popular posts from this blog

Slashdot: US Army Soldier Pleads Guilty To AT&T and Verizon Hacks

US Army Soldier Pleads Guilty To AT&T and Verizon Hacks Published on February 20, 2025 at 01:31AM Cameron John Wagenius pleaded guilty to hacking AT&T and Verizon and stealing a massive trove of phone records from the companies, according to court records filed on Wednesday. From a report: Wagenius, who was a U.S. Army soldier, pleaded guilty to two counts of "unlawful transfer of confidential phone records information" on an online forum and via an online communications platform. According to a document filed by Wagenius' lawyer, he faces a maximum fine of $250,000 and prison time of up to 10 years for each of the two counts. Wagenius was arrested and indicted last year. In January, U.S. prosecutors confirmed that the charges brought against Wagenius were linked to the indictment of Connor Moucka and John Binns, two alleged hackers whom the U.S. government accused of several data breaches against cloud computing services company Snowflake, which were among the ...

Slashdot: AT&T Now Lets Customers Lock Down Account To Prevent SIM Swapping Attacks

AT&T Now Lets Customers Lock Down Account To Prevent SIM Swapping Attacks Published on July 02, 2025 at 01:30AM AT&T has launched a new Account Lock feature designed to protect customers from SIM swapping attacks. The security tool, available through the myAT&T app, prevents unauthorized changes to customer accounts including phone number transfers, SIM card changes, billing information updates, device upgrades, and modifications to authorized users. SIM swapping attacks occur when criminals obtain a victim's phone number through social engineering techniques, then intercept messages and calls to access two-factor authentication codes for sensitive accounts. The attacks have become increasingly common in recent years. AT&T began gradually rolling out Account Lock earlier this year, joining T-Mobile, Verizon, and Google Fi, which already offer similar fraud prevention features. Read more of this story at Slashdot.

Slashdot: Protecting 'Funko' Brand, AI-Powered 'BrandShield' Knocks Itch.io Offline After Questionable Registrar Communications

Protecting 'Funko' Brand, AI-Powered 'BrandShield' Knocks Itch.io Offline After Questionable Registrar Communications Published on December 16, 2024 at 01:04AM Launched in 2013, itch.io lets users host and sell indie video games online — now offering more than 200,000 — as well as other digital content like music and comics. But then someone uploaded a page based on a major videogame title, according to Game Rant. And somehow this provoked a series of overreactions and missteps that eventually knocked all of itch.io offline for several hours... The page was about the first release from game developer 10:10 — their game Funko Fusion, which features characters in the style of Funko's long-running pop-culture bobbleheads. As a major brand, Funko monitors the web with a "brand protection" partner (named BrandShield). Interestingly, BrandShield's SaaS product "leverages AI-driven online brand protection," according to their site, to "detect...