AI Scored 89% on ER Treatment Plans. Two Expert Doctors Scored 34%. That Gap Is Too Large to Dismiss.

An AI scored 89% on emergency room treatment plans. Two expert doctors scored 34%.

That gap is too large to explain as model noise.

A research team at Harvard Medical School and Beth Israel Deaconess Medical Center published a study in Science on May 3. They tested OpenAI's o1 on 76 emergency room cases at three stages: initial triage, first physician contact, and admission to the ward or ICU.

At initial triage — least information, highest urgency — AI identified the correct diagnosis in 67% of cases. Two attending physicians: 50–55%. With more information available, AI reached 82% accuracy versus 70–79% for doctors. On long-term treatment plans: AI scored a median of 89%. Two expert physicians scored 34%.

The researchers' own conclusion: prospective trials needed before clinical deployment. Text-only input, 76 patients, one Boston hospital. Not a deployment brief.

But for Malaysian healthcare, the directional signal matters more than the caveat.

Think about a general practitioner at a private clinic in Puchong seeing 50 patients a day at RM100–180 per consultation. Or a Medical Officer in a district hospital in Bintulu covering the inpatient ward overnight. The constraint isn't diagnostic competence — it's time and cognitive load. An AI that narrows the differential at intake, flags high-risk presentations, and drafts a management plan for the attending doctor to review doesn't replace the doctor. It changes how much the doctor carries per patient.

Who this really matters to:

→ Private hospital groups in Malaysia (KPJ, IHH, Gleneagles) — AI-assisted triage is a unit economics question; if it reduces unnecessary specialist referrals and shortens time from presentation to diagnosis, the cost per episode changes materially → Malaysian telemedicine platforms (DoctorOnCall and similar) — text-based consultations are exactly where AI diagnostic support is most deployable today; the gap between what's technically possible and what's built is narrow → Healthtech startups building clinical decision tools — the Harvard study is the kind of peer-reviewed evidence that opens hospital procurement conversations; “AI in clinical decision support” is no longer speculative → Community clinics and rural health facilities — Malaysia's doctor-to-population ratio in parts of Sabah and Sarawak runs well below WHO minimums; AI-assisted support doesn't solve the shortage, but it changes what one doctor can safely assess per shift

MULTIPLE PERSPECTIVES

The caveats deserve naming. Seventy-six patients at one Boston hospital is a small dataset. Text-only input misses the physical examination that changes many diagnoses. The Harvard researchers specifically said this is not a case for clinical deployment — it's a call for prospective trials. Peer review, larger sample, different clinical settings. Those are reasonable conditions before this enters any real ward.

The treatment plan score is harder to dismiss on those grounds. A 55-point gap between AI and expert physicians on management decisions isn't measurement noise — it's a structural difference in how the system approaches clinical reasoning. AI evaluates options systematically against the evidence base without the fatigue, cognitive load, or recency bias a doctor accumulates managing the 40th patient of a 12-hour shift. That's not a criticism of doctors. It's a description of human cognition under sustained pressure.

The second-order effect for Malaysian healthcare is in the workforce arithmetic. Malaysia's doctor shortage in rural and semi-urban areas is structural — training pipelines take a decade to adjust. AI-assisted clinical decision tools won't close that gap. But they change the coverage arithmetic. One doctor with AI-assisted triage handling 60 patients is not the same as one doctor without it, particularly at 2am in a facility where the next specialist is 80 kilometres away.

If an AI diagnostic support tool became available in your clinic or hospital today — one that drafts management plans for the attending doctor to review — would your current workflows actually integrate it, or would it sit unused next to the electronic health record system nobody fills out completely?

If your clinical workflows already run on structured digital records and you have the IT infrastructure to integrate a decision support tool at the point of care, the technology is closer than most Malaysian healthcare organisations think.

If your clinic still runs on paper notes, WhatsApp referrals, and a HIS system from 2015, AI diagnostic support is premature. Get structured digital records right first. That's the foundation without which the best diagnostic AI in the world delivers nothing at the bedside.

AI in healthcare won't replace your doctor. It will change what doctors are allowed to not know.

AI Scored 89% on ER Treatment Plans. Two Expert Doctors Scored 34%. That Gap Is Too Large to Dismiss.

More Radar posts