Skip to main content

Slashdot: Websites are Blocking the Wrong AI Scrapers

Websites are Blocking the Wrong AI Scrapers
Published on July 29, 2024 at 11:53PM
An anonymous reader shares a report: Hundreds of websites trying to block the AI company Anthropic from scraping their content are blocking the wrong bots, seemingly because they are copy/pasting outdated instructions to their robots.txt files, and because companies are constantly launching new AI crawler bots with different names that will only be blocked if website owners update their robots.txt. In particular, these sites are blocking two bots no longer used by the company, while unknowingly leaving Anthropic's real (and new) scraper bot unblocked. This is an example of "how much of a mess the robots.txt landscape is right now," the anonymous operator of Dark Visitors told 404 Media. Dark Visitors is a website that tracks the constantly-shifting landscape of web crawlers and scrapers -- many of them operated by AI companies -- and which helps website owners regularly update their robots.txt files to prevent specific types of scraping. The site has seen a huge increase in popularity as more people try to block AI from scraping their work. "The ecosystem of agents is changing quickly, so it's basically impossible for website owners to manually keep up. For example, Apple (Applebot-Extended) and Meta (Meta-ExternalAgent) just added new ones last month and last week, respectively," they added.

Read more of this story at Slashdot.

Comments

Popular posts from this blog

Slashdot: AT&T Says Leaked Data of 70 Million People Is Not From Its Systems

AT&T Says Leaked Data of 70 Million People Is Not From Its Systems Published on March 20, 2024 at 02:15AM An anonymous reader quotes a report from BleepingComputer: AT&T says a massive trove of data impacting 71 million people did not originate from its systems after a hacker leaked it on a cybercrime forum and claimed it was stolen in a 2021 breach of the company. While BleepingComputer has not been able to confirm the legitimacy of all the data in the database, we have confirmed some of the entries are accurate, including those whose data is not publicly accessible for scraping. The data is from an alleged 2021 AT&T data breach that a threat actor known as ShinyHunters attempted to sell on the RaidForums data theft forum for a starting price of $200,000 and incremental offers of $30,000. The hacker stated they would sell it immediately for $1 million. AT&T told BleepingComputer then that the data did not originate from them and that its systems were not breached. &q

Slashdot: Texas A&M University Tops Nation in Engineering Research Expenditures

Texas A&M University Tops Nation in Engineering Research Expenditures Published on June 19, 2024 at 12:50AM An anonymous reader shares a report: Texas A&M University held the largest engineering research portfolio of any academic institution in the country last year, nearing half a billion dollars and surpassing Massachusetts Institute of Technology for the top spot, according to U.S. News & World Report. The state flagship's College of Engineering recorded $444.7 million in research expenditures in the 2023 fiscal year, university officials said. A mix of federal, state and private grants funds those efforts, so more expenditures means more partnerships and a larger engineering footprint than ever, Texas A&M University System Chancellor John Sharp said. "An awful lot of people in Washington, a lot of people in Austin, a lot of people in the private sector now rely on Texas A&M to do their engineering research," Sharp said. "Of all the places in

Slashdot: AT&T Can't Hang Up On Landline Phone Customers, California Agency Rules

AT&T Can't Hang Up On Landline Phone Customers, California Agency Rules Published on June 22, 2024 at 01:50AM An anonymous reader quotes a report from Ars Technica: The California Public Utilities Commission (CPUC) yesterday rejected AT&T's request to end its landline phone obligations. The state agency also urged AT&T to upgrade copper facilities to fiber instead of trying to shut down the outdated portions of its network. AT&T asked the state to eliminate its Carrier of Last Resort (COLR) obligation, which requires it to provide landline telephone service to any potential customer in its service territory. A CPUC administrative law judge recommended rejection of the application last month, and the commission voted to dismiss AT&T's application with prejudice on Thursday. "Our vote to dismiss AT&T's application made clear that we will protect customer access to basic telephone service... Our rules were designed to provide that assurance,