AI Vision Models Fail Critical Negation Test in Medical Imaging

MIT researchers have discovered that vision-language models, widely used in medical image analysis, cannot understand negation words like 'no' and 'not'. This critical limitation could lead to serious diagnostic errors when these AI systems are asked to retrieve medical images with specific criteria. The study, published on May 14, 2025, introduces NegBench, a new benchmark to evaluate and improve negation understanding in AI vision systems.

A new study from MIT researchers has revealed a fundamental flaw in vision-language models (VLMs) that could have serious implications for medical diagnostics and other critical applications.

The research team, led by Kumail Alhamoud and senior author Marzyeh Ghassemi from MIT's Department of Electrical Engineering and Computer Science, found that these AI systems—which are increasingly used to analyze medical images—fail to understand negation words like 'no' and 'not' in queries.

This limitation becomes particularly problematic in medical contexts. For example, when a radiologist examines a chest X-ray showing tissue swelling without an enlarged heart, using an AI system to find similar cases could lead to incorrect diagnoses if the model cannot distinguish between the presence and absence of specific conditions.

"Those negation words can have a very significant impact, and if we are just using these models blindly, we may run into catastrophic consequences," warns lead author Alhamoud. When tested on their ability to identify negation in image captions, the models performed no better than random guessing.

To address this problem, the researchers developed NegBench, a comprehensive benchmark with 79,000 examples across 18 task variations spanning image, video, and medical datasets. The benchmark evaluates two core capabilities: retrieving images based on negated queries and answering multiple-choice questions with negated captions.

The team also created datasets with negation-specific examples to retrain these models, achieving a 10% improvement in recall on negated queries and a 28% boost in accuracy on multiple-choice questions with negated captions. However, they caution that more work is needed to address the root causes of this problem.

"If something as fundamental as negation is broken, we shouldn't be using large vision/language models in many of the ways we are using them now—without intensive evaluation," emphasizes Ghassemi.

The research will be presented at the upcoming Conference on Computer Vision and Pattern Recognition, highlighting the urgent need for more robust AI systems in critical applications like healthcare.

Source:

AI Vision Models Fail Critical Negation Test in Medical Imaging

Latest News

ByteDance's Doubao AI Now Offers Real-Time Video Assistance

Dell and NVIDIA Power AI Factories With Blackwell Chips

OnePlus Ditches Alert Slider for AI-Powered Plus Key

Secretary of Energy Chris Wright visits SLAC to explore groundbreaking innovations

German Tech Giants Unite for EU-Backed AI Gigafactory

US Prosecutors Probed Builder.ai Before $1.5B AI Startup Collapsed

Norway's $1.8 Trillion Fund Makes AI Non-Negotiable for Staff

OpenTools.ai Unveils AI News Hub for Tech Professionals

Google Expands AI Computer Control to Developers via Gemini

Google Enhances Gemini Models with Transparent Thought Summaries

AI Vision Models Fail Critical Negation Test in Medical Imaging

Related Articles

Secretary of Energy Chris Wright visits SLAC to explore groundbreaking innovations

Anthropic's Claude 4 Models Set New AI Coding Benchmark

Analysts Maintain 'Moderate Buy' Rating for CCC Intelligent Solutions

NASA Unveils Bold AI Strategy for Next Decade of Space Exploration

OpenAI Ex-Scientist Planned Bunker for Post-AGI World

Latest News

ByteDance's Doubao AI Now Offers Real-Time Video Assistance

Dell and NVIDIA Power AI Factories With Blackwell Chips

OnePlus Ditches Alert Slider for AI-Powered Plus Key

Secretary of Energy Chris Wright visits SLAC to explore groundbreaking innovations

German Tech Giants Unite for EU-Backed AI Gigafactory

US Prosecutors Probed Builder.ai Before $1.5B AI Startup Collapsed

Norway's $1.8 Trillion Fund Makes AI Non-Negotiable for Staff

OpenTools.ai Unveils AI News Hub for Tech Professionals

Google Expands AI Computer Control to Developers via Gemini

Google Enhances Gemini Models with Transparent Thought Summaries