Slashdot: OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test

OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test
Published on 2026-06-20T21:34:00Z
This week OpenAI announced a 750-task test to to measure "whether AI systems can support realistic life science research tasks, not just answer biology questions." But while OpenAI's top-performing GPT-Rosalind model led the rankings, Slashdot reader BrianFagioli notes that "it achieved a pass rate of just 36.1 percent, failing nearly two-thirds of benchmark tasks." Nerds.xyz points out that means "the best-performing model failed nearly two-thirds of the benchmark's tasks." The benchmark also revealed a familiar weakness. AI systems generally perform better when everything is presented as text. Once they are forced to work with supporting documents, figures, or complex datasets, performance drops noticeably. GPT-Rosalind's pass rate fell from 45.1 percent on text-only tasks to 28.1 percent on tasks involving artifacts or URLs. To be fair, the benchmark is not intended to suggest AI is useless in research. Quite the opposite. OpenAI found that models are becoming increasingly capable of scientific communication, evidence synthesis, and translating research findings into practical explanations. Those are valuable skills, particularly for researchers drowning in information. But LifeSciBench serves as a useful reminder that today's AI systems are still far from autonomous scientists. They can help. They can assist. They can sometimes provide surprisingly useful insights. What they cannot reliably do, however, is replace the expertise, judgment, and skepticism that real scientific research requires.

Shameless Plug

Search This Blog

Slashdot: OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test

Labels

Comments

Post a Comment

Popular posts from this blog

Slashdot: AT&T Now Lets Customers Lock Down Account To Prevent SIM Swapping Attacks

Slashdot: AT&T Outlines $250 Billion US Investment Plan To Boost Infrastructure In AI Age

Testing List