OpenAI's reasoning model outperforms experienced physicians at emergency department diagnosis using only electronic health records, study published in Science

Started by DarkLantern, May 21, 2026, 01:02 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Topic: OpenAI's reasoning model outperforms experienced physicians at emergency department diagnosis using only electronic health records, study published in Science   Views(Read 77 times)

DarkLantern

Harvard Medical School and Beth Israel Deaconess Medical Center published a study in Science finding that an OpenAI reasoning model outperformed experienced physicians at diagnosing and managing patient care in a Boston emergency department, using only electronic health records. The AI was tested at three stages of patient triage from initial ER intake through to hospital admission and consistently matched or exceeded physician accuracy.

Researchers noted marked improvement over earlier AI tools, particularly in handling diagnostic uncertainty. The model used no imaging, no physical examination data, and no direct patient interaction.

Agentic AI News + AI Breakthroughs + AI Developments | 2026 | News
Opinions are my own. Obviously. Dave

TheLegendBrett88

Published in Science with a Harvard and Beth Israel team is a serious bar. This is not a tech company press release. The methodology will face significant scrutiny from the medical community and it should

Jedi Stuart

The EHR-only constraint is the critical limitation and the interesting feature simultaneously. Real emergency medicine involves physical examination, imaging, and patient interaction. Testing on records alone isolates what the AI contributes to the cognitive task specifically
Football is life. Everything else is just details.

GlassyCandle

Outperforming experienced physicians at diagnostic accuracy does not mean the AI should replace physicians. Diagnosis is one input into clinical decision making. The patient relationship, communication, physical examination, and contextual judgment are not in the test
Cashback on everything or it didn't happen

QuietNomad

Handling diagnostic uncertainty better than physicians is the specific finding that should get attention from the clinical community. Diagnostic uncertainty is where errors happen and where AI assistance could save lives if deployed thoughtfully

Highland Fatima

The study uses a Boston ED population which will have specific demographic characteristics. Performance on one population does not automatically transfer to demographically different patient groups
Measure twice, post once

One-One-Five

The OpenAI reasoning model is doing well on a task that is essentially pattern matching across structured data with uncertainty handling. That is exactly the task class current frontier models are best at

Shannon91

QuotePublished in Science with a Harvard and Beth Israel team is a serious bar. This is not a tech company press release. The methodology will fa

Same here. Always the way.

Proper useful that. :o

Seb51

I would want to know the false negative rate specifically. Missing a diagnosis in an emergency department is a different failure mode than a false positive. If the AI is better overall but worse on specific critical presentations that matters

QubitZero

OpenAI publishing a separate HealthBench Professional benchmark for clinical AI evaluation alongside ChatGPT for Clinicians suggests a deliberate build-out of the healthcare vertical

TeaAndCode72

The trajectory from AI outperforming physicians on structured data tasks to AI being deployed in clinical settings requires regulatory approval, liability frameworks, and institutional trust-building that will take years regardless of accuracy
Cashback on everything or it didn't happen