AI enhances diagnostic accuracy of mammograms




A Singapore study shows that incorporating artificial intelligence (AI) into mammography can improve breast cancer detection rates.
“This is the first study in Asia to evaluate AI-assisted mammography interpretation by radiologists with varying experience,” the researchers said. “AI significantly improved diagnostic performance and efficiency among residents, helping to narrow the experience-performance gap without compromising specificity.”
Half of the 500 participants had malignancies (median age 60.2 years) while the other half did not (median age 53 years). Overall, ~61 percent had BIRADS* density category C. Eighty-four percent of the malignancies were invasive cancer, while the rest were ductal carcinoma in situ. About 54 percent of the malignant cases were masses, and 10.8 percent were calcifications. In the non-malignant group, 69.2 percent had normal mammograms, 17.6 percent had benign lesions, 3.6 percent had possibly benign lesions, and 9.6 percent had suspicious lesions.
Seventeen radiologists participated: four consultants, four senior residents (SRs), and nine junior residents (JRs). None had previous experience of reading AI-assisted mammograms in a trial or clinical setting at the time of the study. Examinations were retrospectively processed using the FxMammo, an AI assistant approved by the Health Sciences Authority. [JMIR Form Res 2025;doi:10.2196/66931]
Reader experience
With AI assistance, sensitivity improved across all groups (from 56.9 to 61.6 percent; p<0.001 [JRs], 55.4 to 64.1 percent; p<0.001 [SRs], and 68.5 to 70.5 percent; p=0.35 [consultants]), as did accuracy (from 75.8 to 78.9 percent; p=0.005, 76.1 percent to 80.4 percent; p=0.002, and 82.3 to 83.9 percent; p=0.24, respectively). Specificity improved among JRs (from 94.6 to 96.3 percent; p=0.02) and consultants (from 96 to 97 percent; p=0.22).
According to the researchers, the improved sensitivity and accuracy suggest that AI can help alleviate the effects of lower diagnostic experience.
The less pronounced gains among consultants imply that the effect of AI assistance is more significant for those earlier in their careers, they noted. “This phenomenon might be explained by the ‘regression to the mean’ effect, where extreme performances (JRs with less experience) tend to improve with intervention, while those with higher baseline performance (experienced consultants) see more modest improvements.”
AUROC, agreement rate, reading time
AI standalone achieved an AUROC** of 0.93, which was higher than consultants (0.91; p=0.21), SRs (0.88; p=0.013), and JRs (0.86; p<0.001) with AI assistance. “[The comparable AUROCs between consultants and SRs underscore AI’s potential] to support less experienced readers and help narrow the performance gap between intermediate and expert radiologists,” the researchers explained.
Agreement rates pre- and post-AI assistance were κ=0.54 and 0.60 (p=0.08) in JRs, κ=0.59 and 0.62 (p=0.48) in SRs, and κ=0.66 and 0.71 (p=0.07) in consultants. According to the researchers, the improved inter-reader agreement highlights AI’s potential to reduce inconsistencies in diagnostic decisions among radiologists with different levels of experience.
Significant time savings were achieved with AI assistance, more so in non-malignant vs malignant cases (18 vs 11.1 s per reading; p<0.001). “This suggests that AI could offer substantial cost savings by enhancing efficiency,” they said.
Scalable benefits
Some of the challenges of interpreting mammograms are rising imaging volumes, a global shortage of breast radiologists, and variability in reader experience, the researchers noted. “AI has been proposed as a potential adjunct to address these issues, particularly in settings with high breast density, such as Asian populations.”
The results underscore the potential of AI to improve diagnostic consistency and workflow and support training. “Integration into clinical and educational settings may offer scalable benefits, though careful attention to threshold calibration, feedback loops, and real-world validation remains essential,” they said.