Skip to main content

Slashdot: OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test

OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test
Published on 2026-06-20T21:34:00Z
This week OpenAI announced a 750-task test to to measure "whether AI systems can support realistic life science research tasks, not just answer biology questions." But while OpenAI's top-performing GPT-Rosalind model led the rankings, Slashdot reader BrianFagioli notes that "it achieved a pass rate of just 36.1 percent, failing nearly two-thirds of benchmark tasks." Nerds.xyz points out that means "the best-performing model failed nearly two-thirds of the benchmark's tasks." The benchmark also revealed a familiar weakness. AI systems generally perform better when everything is presented as text. Once they are forced to work with supporting documents, figures, or complex datasets, performance drops noticeably. GPT-Rosalind's pass rate fell from 45.1 percent on text-only tasks to 28.1 percent on tasks involving artifacts or URLs. To be fair, the benchmark is not intended to suggest AI is useless in research. Quite the opposite. OpenAI found that models are becoming increasingly capable of scientific communication, evidence synthesis, and translating research findings into practical explanations. Those are valuable skills, particularly for researchers drowning in information. But LifeSciBench serves as a useful reminder that today's AI systems are still far from autonomous scientists. They can help. They can assist. They can sometimes provide surprisingly useful insights. What they cannot reliably do, however, is replace the expertise, judgment, and skepticism that real scientific research requires.

Read more of this story at Slashdot.

Comments

Popular posts from this blog

Slashdot: AT&T Now Lets Customers Lock Down Account To Prevent SIM Swapping Attacks

AT&T Now Lets Customers Lock Down Account To Prevent SIM Swapping Attacks Published on July 02, 2025 at 01:30AM AT&T has launched a new Account Lock feature designed to protect customers from SIM swapping attacks. The security tool, available through the myAT&T app, prevents unauthorized changes to customer accounts including phone number transfers, SIM card changes, billing information updates, device upgrades, and modifications to authorized users. SIM swapping attacks occur when criminals obtain a victim's phone number through social engineering techniques, then intercept messages and calls to access two-factor authentication codes for sensitive accounts. The attacks have become increasingly common in recent years. AT&T began gradually rolling out Account Lock earlier this year, joining T-Mobile, Verizon, and Google Fi, which already offer similar fraud prevention features. Read more of this story at Slashdot.

Slashdot: AT&T Outlines $250 Billion US Investment Plan To Boost Infrastructure In AI Age

AT&T Outlines $250 Billion US Investment Plan To Boost Infrastructure In AI Age Published on 2026-03-10T20:00:00Z AT&T plans to invest more than $250 billion over the next five years to expand U.S. telecom infrastructure for the AI age. The company says it will also hire thousands of technicians while partnering with AST SpaceMobile to extend coverage to remote areas. Reuters reports: Rapid adoption of artificial intelligence, cloud computing and connected devices has prompted telecom operators to invest heavily in fiber and 5G networks as they also seek to fend off intensifying competition from cable broadband providers. AT&T, which has about 110,000 employees in the U.S., said the new hires will help build and maintain its infrastructure. The outlay includes capital expenditure and other spending, the company said. The spending will focus on expanding its fiber and wireless networks, including accelerating deployment of fiber broadband, 5G home internet and satellite co...