A review of uncertainty quantification in medical image analysis: probabilistic and non-probabilistic methods

Ling Huang Su Ruan Mengling Feng + 1 penulis

Abstrak

The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods have been proposed as a potential solution to quantify the reliability of machine learning models and thus increase the interpretability and acceptability of the result. In this review, we offer a comprehensive overview of prevailing methods proposed to quantify uncertainty inherent in machine learning models developed for various medical image tasks. Contrary to earlier reviews that exclusively focused on probabilistic methods, this review also explores non-probabilistic approaches, thereby furnishing a more holistic survey of research pertaining to uncertainty quantification for machine learning models. Analysis of medical images with the summary and discussion on medical applications and the corresponding uncertainty evaluation protocols are presented, which focus on the specific challenges of uncertainty in medical image analysis. We also highlight some potential future research work at the end. Generally, this review aims to allow researchers from both clinical and technical backgrounds to gain a quick and yet in-depth understanding of the research in uncertainty quantification for medical image analysis machine learning models.

Artikel Ilmiah Terkait

Trustworthy clinical AI solutions: a unified review of uncertainty quantification in deep learning models for medical image analysis

Senan Doyle Harmonie Dehaene A. Tucholka + 3 lainnya

5 Oktober 2022

The full acceptance of Deep Learning (DL) models in the clinical field is rather low with respect to the quantity of high-performing solutions reported in the literature. End users are particularly reluctant to rely on the opaque predictions of DL models. Uncertainty quantification methods have been proposed in the literature as a potential solution, to reduce the black-box effect of DL models and increase the interpretability and the acceptability of the result by the final user. In this review, we propose an overview of the existing methods to quantify uncertainty associated with DL predictions. We focus on applications to medical image analysis, which present specific challenges due to the high dimensionality of images and their variable quality, as well as constraints associated with real-world clinical routine. Moreover, we discuss the concept of structural uncertainty, a corpus of methods to facilitate the alignment of segmentation uncertainty estimates with clinical attention. We then discuss the evaluation protocols to validate the relevance of uncertainty estimates. Finally, we highlight the open challenges for uncertainty quantification in the medical field.

Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis

Changjian Shui Raghav Mehta T. Arbel

6 Maret 2023

Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer's disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.

A survey of uncertainty in deep neural networks

M. Shahzad Matthias Humt R. Bamler + 11 lainnya

7 Juli 2021

Over the last decade, neural networks have reached almost every field of science and become a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions has become more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over- or under-confidence, i.e. are badly calibrated. To overcome this, many researchers have been working on understanding and quantifying uncertainty in a neural network’s prediction. As a result, different types and sources of uncertainty have been identified and various approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. For that, a comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and irreducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks (BNNs), ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for calibrating neural networks, and give an overview of existing baselines and available implementations. Different examples from the wide spectrum of challenges in the fields of medical image analysis, robotics, and earth observation give an idea of the needs and challenges regarding uncertainties in the practical applications of neural networks. Additionally, the practical limitations of uncertainty quantification methods in neural networks for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

Machine learning for medical imaging: methodological failures and recommendations for the future

V. Cheplygina G. Varoquaux

12 April 2022

Research in computer analysis of medical images bears many promises to improve patients’ health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.

Translation of predictive modeling and AI into clinics: a question of trust

J. Caspers

25 April 2021

During the last decade, data science technologies such as artificial intelligence (AI) and radiomics have emerged strongly in radiologic research. Radiomics refers to the (automated) extraction of a large number of quantitative features from medical images [1]. A typical radiomics workflow involves image acquisition and segmentation as well as feature extraction and prioritization/reduction as preparation for its ultimate goal, which is predictive modeling [2]. This final step is where radiomics and AI typically intertwine to build a gainful symbiosis. In recent years, the field of medical imaging has seen a rising number of publications on radiomics and AI applications with increasingly refined methodologies [3, 4]. Formulation of best practice white papers and quality criteria for publications on predictive modeling like the TRIPODS [5] or CLAIM [6] criteria have substantially promoted this qualitative gain. Consequently, relevant methodological approaches advancing generalizability of predictive models are increasingly being observed in recent publications, e.g., the accurate composition of representative and unbiased datasets, avoidance of data leakage, the incorporation of (nested) crossvalidation approaches for model development, particularly on small datasets, or the use of independent, external test samples. In this regard, the work of Song et al [7] on a clinicalradiomics nomogram for prediction of functional outcome in intracranial hemorrhage published in the current issue of European Radiology is just one example for the general trend. However, in contrast to the rising utilization and importance of predictive modeling in medical imaging research, these technologies have not been widely adopted in clinical routine. Beside regulatory, medicolegal, or ethical issues, one of the major hurdles for a broad usage of AI and predictive models is the lack of trust in these technologies by medical practitioners, healthcare stakeholders, and patients. After more than a decade of scientific progress on AI and predictive modeling in medical imaging, we should now take the opportunity to focus our research on the trustworthiness of AI and predictive modeling in order to trailblaze their translation into clinical practice. Several prospects could enhance trustworthiness of predictive models for clinical use. One of the main factors will be transparency on their reliability in real-world applications. Large multicentric prospective trials will be paramount to assess and validate the performance and especially generalizability of predictive models in a robust and minimally biased fashion. Additionally, benchmarking of AI tools by independent institutions on external heterogeneous real-world data would provide transparency on model performances and enhance trust. In general, trust in new technologies is severely influenced by the comprehensibility of these techniques for their users. In the field of predictive modeling, this topic is often described with the term “explainable AI,” which is being increasingly considered in current research [8]. Explainable AI seeks to unravel the “black-box” nature of many predictive models, including artificial neural networks, by making decision processes comprehendible, e.g., by revealing the features that drive their decisions. Trust in predictive models will therefore substantially increase, when models are developed transparently and AI systems made comprehensible. Another issue of current AI tools is that theymainly incorporate narrowAI, i.e., they address only one very specific task. We are currently miles, if not light-years away, from building real strong AI, that is, artificial intelligence having the capacity to learn any intellectual task that a human being can. However, building more comprehensive AI systems solving multiple predictive tasks might enhance their trustworthiness for users. For example, a user might be inclined to follow thoughts along the line of “I have good experience in this system predicting the This comment refers to the article available at https://doi.org/10.1007/ s00330-021-07828-7.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.