On 13 January at 10:00 Viacheslav Komisarenko will defend his doctoral thesis “Aligning Training Loss to Evaluation Metrics in Deep Learning“.
Supervisor:
Prof. Meelis Kull, University of Tartu
Opponents:
Prof. Jesús Cid-Sueiro, Charles III University of Madrid (Spain) and
Assoc. Prof. Maurizio Filippone, King Abdullah University of Science and Technology (Kingdom of Saudi Arabia)
Summary
Recent advances in machine learning (ML) have driven broad adoption of ML systems, often surpassing algorithmic baselines and human performance. This progress is enabled by large, high-quality datasets, expressive architectures, and robust optimisation.
The loss function is a key design choice: it shapes the optimisation landscape and guides how the model is fitted. Ultimately, performance is judged by evaluation metrics, many of which are discontinuous or non-differentiable, requiring surrogate losses for training. Classical metrics alone such as accuracy no longer meet rising demands for performance evaluation. Practitioners employ cost-sensitive metrics, where errors have asymmetric importance (e.g., a missed diagnosis is worse than a false alarm), and calibration metrics, which assess the quality of predicted probabilities.
This thesis comprises three interconnected studies aiming to align training loss choices with evaluation metrics.
The first study addresses cost-sensitive classification, where error costs are uncertain rather than fixed scalars during training. We model costs as probability distributions derived, for example, from expert estimates and derive losses suited for this setting.
The second study focuses on calibration. We analyse why focal loss yields well-calibrated models and reveal that its expression embeds a calibration map. This explains its performance and motivates new calibration methods, which we extend to a broader family of separable losses.
The third contribution tackles the gap between upstream evaluation metrics, which give a high- level performance overview, and downstream utilities, which are domain-specific and costly to compute. We propose learning a transformation to align their values, study when properness is preserved, and demonstrate feasibility on proof-of-concept tasks.
Collectively, these studies advance understanding of how to select training losses for practical evaluation metrics, offering guidelines for practitioners.
The defence will be held also in Zoom (meeting ID: 923 4644 5582, passcode: ati).