OpenAI's reasoning model outperforms experienced physicians at emergency department diagnosis using only electronic health records, study published in Science

DarkLantern · May 21, 2026, 01:02 PM

Harvard Medical School and Beth Israel Deaconess Medical Center published a study in Science finding that an OpenAI reasoning model outperformed experienced physicians at diagnosing and managing patient care in a Boston emergency department, using only electronic health records. The AI was tested at three stages of patient triage from initial ER intake through to hospital admission and consistently matched or exceeded physician accuracy.

Researchers noted marked improvement over earlier AI tools, particularly in handling diagnostic uncertainty. The model used no imaging, no physical examination data, and no direct patient interaction.

Agentic AI News + AI Breakthroughs + AI Developments | 2026 | News

www.crescendo.ai

TheLegendBrett88 · May 21, 2026, 01:02 PM

Published in Science with a Harvard and Beth Israel team is a serious bar. This is not a tech company press release. The methodology will face significant scrutiny from the medical community and it should

Jedi Stuart · May 21, 2026, 01:02 PM

The EHR-only constraint is the critical limitation and the interesting feature simultaneously. Real emergency medicine involves physical examination, imaging, and patient interaction. Testing on records alone isolates what the AI contributes to the cognitive task specifically

GlassyCandle · May 21, 2026, 01:03 PM

Outperforming experienced physicians at diagnostic accuracy does not mean the AI should replace physicians. Diagnosis is one input into clinical decision making. The patient relationship, communication, physical examination, and contextual judgment are not in the test

QuietNomad · May 21, 2026, 01:03 PM

Handling diagnostic uncertainty better than physicians is the specific finding that should get attention from the clinical community. Diagnostic uncertainty is where errors happen and where AI assistance could save lives if deployed thoughtfully

Highland Fatima · May 21, 2026, 01:04 PM

The study uses a Boston ED population which will have specific demographic characteristics. Performance on one population does not automatically transfer to demographically different patient groups

One-One-Five · May 21, 2026, 01:04 PM

The OpenAI reasoning model is doing well on a task that is essentially pattern matching across structured data with uncertainty handling. That is exactly the task class current frontier models are best at

Shannon91 · May 22, 2026, 12:35 PM

QuotePublished in Science with a Harvard and Beth Israel team is a serious bar. This is not a tech company press release. The methodology will fa

Same here. Always the way.

Proper useful that.

Seb51 · May 25, 2026, 09:11 PM

I would want to know the false negative rate specifically. Missing a diagnosis in an emergency department is a different failure mode than a false positive. If the AI is better overall but worse on specific critical presentations that matters

QubitZero · May 26, 2026, 12:39 PM

OpenAI publishing a separate HealthBench Professional benchmark for clinical AI evaluation alongside ChatGPT for Clinicians suggests a deliberate build-out of the healthcare vertical

TeaAndCode72 · May 27, 2026, 09:28 AM

The trajectory from AI outperforming physicians on structured data tasks to AI being deployed in clinical settings requires regulatory approval, liability frameworks, and institutional trust-building that will take years regardless of accuracy

OpenAI's reasoning model outperforms experienced physicians at emergency department diagnosis using only electronic health records, study published in Science

Related Topics (1)