
A suite of deep learning (DL) models to automate tuberculosis (TB) risk assessment from lung ultrasound (LUS) images fulfils the WHO criteria for a sputum-free TB triage test and outwits human experts in the TrUST study.
“TrUST is the first study to demonstrate the diagnostic performance of expert and artificial intelligence (AI)-interpreted LUS for TB triage,” said lead study author Dr Véronique Suttels from Lausanne University Hospital in Switzerland, at ESCMID Global 2025.
The AI suite includes three DL models: (1) ULTR-AI: predicts TB directly from LUS images; (2) ULTR-AI[signs]: detects LUS signs as interpreted by human experts; and (3) ULTR-AI[max]: uses the highest risk score from both models to optimize accuracy. Its performance was compared against human experts using a similar machine learning model.
ULTR-AI performance had a sensitivity of 0.91, specificity of 0.85, and mean AUROC* 0.93. ULTR-AI[max] achieved 0.93 sensitivity, 0.81 specificity, and mean AUROC 0.93. According to Suttels, these results exceed the WHO target thresholds (0.9 sensitivity and 0.7 specificity). [ESCMID Global 2025, abstract O0573]
The mean AUROC generated from human experts was 0.84. Hence, ULTR-AI[max] outperforms human experts by 9 percent.
“Our model clearly detects human-recognizable LUS findings – like large consolidations and interstitial changes – but an end-to-end DL approach captures even subtler features beyond the human eye,” Suttels noted. “[We hope] that this will help identify early pathological signs such as small subcentimetre pleural lesions common in TB.”
From human eye to AI
TB incidence went up by nearly 5 percent between 2020 and 2023 despite previous declines, and early screening programmes have been met with high patient dropouts at the diagnostic stage due to high x-ray equipment cost and a lack of trained radiologists. [https://www.who.int/teams/global-programme-on-tuberculosis-and-lung-health/tb-reports/global-tuberculosis-report-2024, accessed 21 May 2025; PLOS ONE 2021;16:e0251236]
Also, a third of incident TB cases remain undiagnosed because diagnostic access is a major problem, underlined Suttels. “These challenges underscore the urgent need for more accessible diagnostic tools … which is why one of the WHO research priorities is simple and easy diagnostics – non-sputum, point-of-care (POC), low cost, and low expertise.”
The ULTR-AI suite leverages DL algorithms to interpret LUS in real time, making it more accessible for TB triage. This would be especially helpful in low- and middle-income countries and for rural healthcare workers (HCWs) with minimal training. Enabling POC TB triage with minimal infrastructure and expertise may lead to timely detection and treatment. “By reducing operator dependency and standardizing the test, this technology can help diagnose patients faster and more efficiently,” Suttels said.
They tested the suite’s diagnostic ability on 504 individuals (61 percent men, median age 40 years) with cough or dyspnoea in a tertiary urban centre in Benin, West Africa. Of these, 38 percent (n=192) were bacteriologically confirmed to have pulmonary TB, 13 percent had previous TB, 52 percent were underweight, and 16 percent were HIV-positive.
Among those who had a negative single sputum test (62 percent; n=312), 6 percent (n=32) were clinically diagnosed to have TB according to an expert panel.
Using an ultrasound-on-a-chip device compatible with a smartphone or tablet, LUS images were acquired via a standardized 14-point LUS sliding scan protocol. Human expert readers interpreted the images based on typical LUS findings. A single sputum molecular test served as reference standard.
“A key advantage of our AI models is the immediate turnaround time once integrated into an app,” said Suttels. “This allows LUS to function as a true POC test with good diagnostic performance at triage, providing instant results while the patient is still with the HCW. Faster diagnosis could improve linkage to care and reduce the risk of being lost to follow-up.”
Suttels added that this new triage-level tool operates on a low technical platform. She shared that in the post-doctoral household survey she conducted, kids as young as 4 years old were able to perform the LUS and acquire high-quality images.
Future trials
Further validation in diverse populations is warranted to ascertain the broader clinical utility of the AI suite. As the participants presented with respiratory symptoms, Suttels shared that future research should look into cohorts of individuals with subclinical TB.
Moreover, considering that half of the participants were underweight, it would be imperative to evaluate the performance of the AI suite in individuals with obesity, she added. “Obesity is a very important factor in LUS. We do need to work on external validation in populations with different rates of obesity.”