menu
close

Medical AI Systems Fail to Understand Negation in Image Analysis

MIT researchers have discovered that vision-language models used in medical imaging cannot comprehend negation words like 'no' and 'not', potentially leading to dangerous misdiagnoses. When tested on negation tasks, these AI systems performed no better than random guessing, raising serious concerns about their deployment in healthcare settings. Researchers have developed a new benchmark called NegBench and proposed solutions that could improve negation understanding by up to 28%.
Medical AI Systems Fail to Understand Negation in Image Analysis

A critical flaw in artificial intelligence systems used to analyze medical images could put patients at risk, according to new research from MIT published this week.

The study, led by graduate student Kumail Alhamoud and Associate Professor Marzyeh Ghassemi, reveals that vision-language models (VLMs) – AI systems widely deployed in healthcare settings – fundamentally fail to understand negation words like 'no' and 'not' when analyzing medical images.

"Those negation words can have a very significant impact, and if we are just using these models blindly, we may run into catastrophic consequences," warns Alhamoud, the study's lead author.

The researchers demonstrated this problem through a clinical example: if a radiologist examines a chest X-ray showing tissue swelling but no enlarged heart, an AI system might incorrectly retrieve cases with both conditions, potentially leading to an entirely different diagnosis. When formally tested, these AI models performed no better than random guessing on negation tasks.

To address this critical limitation, the team has developed NegBench, a comprehensive evaluation framework spanning 18 task variations and 79,000 examples across image, video, and medical datasets. Their proposed solution involves retraining VLMs with specially created datasets containing millions of negated captions, which has shown promising results – improving recall on negated queries by 10% and boosting accuracy on multiple-choice questions with negated captions by 28%.

"If something as fundamental as negation is broken, we shouldn't be using large vision/language models in many of the ways we are using them now – without intensive evaluation," cautions Ghassemi, highlighting the need for careful assessment before deploying these systems in high-stakes medical environments.

The research, which includes collaborators from OpenAI and Oxford University, will be presented at the upcoming Conference on Computer Vision and Pattern Recognition. The team has made their benchmark and code publicly available to help address this critical AI safety issue.

Source:

Latest News