A computationally frugal, open-source chest CT foundation model for thoracic disease detection in lung cancer screening programmes

Plain Language Summary

Lung cancer screening uses low-dose chest scans to find cancer early, but these scans can also reveal other diseases before symptoms appear. Interpreting large numbers of scans is challenging due to the limited number of radiologists worldwide. We introduce TANGERINE, an efficient and easy-to-use, open-source AI model that can analyse three-dimensional chest scans efficiently and accurately. The model learns patterns from thousands of scans and can then be adapted to detect lung diseases using only small amounts of new data. TANGERINE performs as well as, or better than, more complex systems while using less computing power. Its open-source design provides a foundation for future tools that could make advanced scan analysis more accessible, supporting earlier diagnosis and improved lung health worldwide.

Abstract

Background: Low-dose computed tomography (LDCT) employed in lung cancer screening (LCS) programmes is increasing in uptake worldwide. LCS programmes herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease, yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Methods: Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. The model is pretrained using self-supervised learning on more than 98,000 thoracic LDCT scans, including the United Kingdom’s largest LCS initiative to date and 27 public datasets. By extending a masked autoencoder framework to three-dimensional imaging, TANGERINE provides a scalable solution for LDCT analysis, combining architectural simplicity, public availability, and modest computational requirements. Results: TANGERINE demonstrates superior computational and data efficiency in a retrospective multi-dataset analysis: it converges rapidly during fine-tuning, requiring significantly fewer graphics processing unit hours than models trained from scratch, and achieves comparable or superior performance using only a fraction of the fine-tuning data. The model achieves strong performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, and generalises robustly across diverse clinical centres. Conclusions: TANGERINE’s accessible, open-source, lightweight design lays the foundation for rapid integration into next-generation medical imaging tools, enabling lung cancer screening programmes to pivot from a singular focus on lung cancer detection toward comprehensive respiratory disease management in high-risk populations.

Publication
Communications Medicine
Niccolò McConnell
Niccolò McConnell
PhD Student

Niccolò is a PhD student on the AI enabled Healthcare CDT at University College London. His research focuses on developing a deep learning foundation model for lung cancer screening.

Pardeep Vasudev
Pardeep Vasudev
PhD Student

Centre for Medical Image Computing

Daisuke Yamada
Daisuke Yamada
Research Fellow

University College London

Daryl Cheng
Daryl Cheng
PhD Student

Centre for Medical Image Computing

Mehran Azimbagirad
Mehran Azimbagirad
Senior Research Fellow

Hawkes Institute

John Mccabe
John Mccabe
PhD Student

University College London PhD Student

Shahab Aslani
Shahab Aslani
Senior Research Fellow

Centre for Medical Image Computing

Ahmed H. Shahin
Ahmed H. Shahin
Former PhD Student

University College London

Joseph Jacob
Joseph Jacob
Principal Investigator

Wellcome Trust Fellow