Researchers at the Icahn School of Medicine at Mount Sinai and Rabin Medical Center in Israel have discovered a troubling flaw in how artificial intelligence handles medical ethics decisions, potentially jeopardizing patient care if left unchecked.
The study, published July 24 in npj Digital Medicine, tested several commercial large language models (LLMs), including ChatGPT, on slightly modified versions of well-known ethical dilemmas. The results revealed that AI consistently defaulted to intuitive but incorrect responses, even when presented with clear contradictory information.
"AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details," explained co-senior author Dr. Eyal Klang, Chief of Generative AI in Mount Sinai's Windreich Department of Artificial Intelligence and Human Health. "In healthcare, where decisions carry serious ethical and clinical implications, missing those nuances can have real consequences for patients."
In one revealing test, researchers modified the classic "Surgeon's Dilemma" puzzle by explicitly stating that a boy's father was the surgeon, removing any ambiguity. Despite this clarity, several AI models still incorrectly insisted the surgeon must be the boy's mother, demonstrating how AI can cling to familiar patterns even when contradicted by new information.
Similarly, when presented with a scenario about religious parents and a blood transfusion, the AI models recommended overriding parental refusal even when the scenario clearly stated the parents had already consented to the procedure.
"Simple tweaks to familiar cases exposed blind spots that clinicians can't afford," noted lead author Dr. Shelly Soffer from Rabin Medical Center's Institute of Hematology. "It underscores why human oversight must stay central when we deploy AI in patient care."
The research team, inspired by Daniel Kahneman's book "Thinking, Fast and Slow," found that AI exhibits the same tendency toward fast, intuitive thinking that humans do, but often lacks the ability to shift to more deliberate analytical reasoning when needed.
Moving forward, the Mount Sinai team plans to establish an "AI assurance lab" to systematically evaluate how different models handle real-world medical complexity. The researchers emphasize that AI should complement clinical expertise rather than replace it, particularly in ethically sensitive or high-stakes decisions.