menu
close

AI Models Fail Critical Medical Ethics Tests, Mount Sinai Study Reveals

A groundbreaking study from Mount Sinai and Rabin Medical Center shows that even advanced AI models like ChatGPT make alarming errors when navigating medical ethics scenarios. Researchers discovered that AI systems often default to familiar but incorrect responses when presented with slightly modified ethical dilemmas, sometimes completely ignoring updated information. These findings raise serious concerns about AI reliability in high-stakes healthcare decisions where ethical nuance is critical.
AI Models Fail Critical Medical Ethics Tests, Mount Sinai Study Reveals

Researchers at the Icahn School of Medicine at Mount Sinai have uncovered a dangerous flaw in how artificial intelligence handles medical ethics decisions, revealing limitations that could have serious implications for patient care.

The study, published July 22, 2025, in NPJ Digital Medicine, tested several commercially available large language models (LLMs) including ChatGPT on modified versions of well-known ethical dilemmas. The research team, led by Dr. Eyal Klang, Chief of Generative AI at Mount Sinai, and Dr. Girish Nadkarni, Chair of the Windreich Department of AI and Human Health, found that AI systems frequently made basic errors when confronted with slightly altered scenarios.

In one revealing example, researchers modified the classic "Surgeon's Dilemma" puzzle by explicitly stating that a boy's father was the surgeon. Despite this clear information, several AI models incorrectly insisted the surgeon must be the boy's mother, demonstrating how AI can cling to familiar patterns even when contradicted by new information.

Another test involved a scenario about religious parents and a blood transfusion. When researchers altered the scenario to state that parents had already consented to the procedure, many AI models still recommended overriding a refusal that no longer existed.

"AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details," explained Dr. Klang. "In healthcare, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients."

The research was inspired by Daniel Kahneman's book "Thinking, Fast and Slow," which contrasts fast, intuitive reactions with slower, analytical reasoning. The findings suggest that AI models, like humans, can struggle to shift between these two modes of thinking.

While the researchers emphasize that AI still has valuable applications in medicine, they stress the need for thoughtful human oversight, especially in situations requiring ethical sensitivity or nuanced judgment. "These tools can be incredibly helpful, but they're not infallible," noted Dr. Nadkarni. "AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions."

Source:

Latest News