skip to content

Mathematical Research at the University of Cambridge

 

AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is often assumed to be fixed and certain. However, in many domains, such as medical applications, the ground truth is often curated in the form of differential diagnoses provided by multiple experts. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty through  potential disagreement. 
 
In this talk, I will argue that ignoring uncertainty leads to overly optimistic estimates of model performance, therefore underestimating risk associated with particular diagnostic decisions, leading to unanticipated failure modes. We propose a statistical aggregation approach, where we infer a distribution on probabilities of underlying medical condition candidates themselves, based on observed annotations. This formulation naturally accounts for the potential disagreements between different experts, as well as uncertainty stemming from individual differential diagnoses, capturing the entire ground truth uncertainty. We conclude that, while assuming a crisp ground truth can be acceptable for many AI applications, a more nuanced evaluation protocol should be utilized in medical diagnosis. If time permits, I will also cover some work, based on conformal methods that can provide statistical guarantees.
 
Based on joint work with David Stutz, Melih Barsbey, Alan Karthikesalingam, Arnaud Doucet and many others

Further information

Time:

13Aug
Aug 13th 2025
11:50 to 12:25

Venue:

Seminar Room 1, Newton Institute

Speaker:

Taylan Cemgil (Google DeepMind Technologies Limited)

Series:

Isaac Newton Institute Seminar Series