This work, jointly with Yuri Verges, is motivated by the different methods developed by Hernándes-Orallo et al. (2011, 2013), Dimitriadis et al. (2021), Shao et al. (2023), Dimitriadis et al. (2024), which complement each other.
Dimitriadis et al. (2021) have introduced the CORP approach based on nonparametric isotonic regression by using the traditional pool-adjacent-violators (PAV) algorithm for calibration of probabilistic forecasts. The CORP approach generates reliability curves, being the graph of the PAV-(re)calibrated forecast probability.
Dimitriadis et al. (2024) proposed a triplet of diagnostic tools, each with different capabilities for performance evaluation of binary classifiers: reliability curves produced by CORP with the idea to diagnose calibration, receiver operating characteristic (ROC) curves which evaluate discrimination ability and Murphy curves for overall assessment of predictive performance. In their Theorem 3 the authors show that if X and Z are calibrated probabilistic forecasts for the binary outcome Y, then the following statements are equivalent: (i) X is sharper than Z; (ii) X dominates Z in ROC sense and (iii) X dominates Z in Murphy sense.
The area under the ROC curve (to be abbreviated AUC) is a popular metric of the accuracy of quantitative diagnostic test. However, the traditional machine learning models trained with AUC are not well studied for cost sensitive decision problems. The notable exeption is the work of Hernándes-Orallo et al. (2013) who demonstrate that the ROC curves can be transformed into the cost space. This update is equivalent to computing the area under the convex hull of ROC curves. Thus, the AUC can be seen as the performance of the model with uniform cost distribution, being an unreasonable assumption for practical needs.
Extending the idea of Hernándes-Orallo et al. (2013), Shao et al. (2023) introduced the notion of weighted ROC curve in cost space joining the robustness of the model to the class distribution and cost distribution. In other words, extending AUC to the non-uniform cost-sensitive learning. The authors construct a new environment where the costs are treated like a dataset to share out an arbitrary unknown cost distribution and launch a weighted version of AUC (to be abbreviated WAUC) where the cost distribution can be incorporated into its calculation via decision threshold.
Thus, Shao et al. (2023) develop the following two-level algorithm to bridge WAUC and cost: the inner-level problem approximates the optimal threshold from sampling costs, and the outer-level problem minimizes the WAUC loss over the optimal threshold distribution. Such an advanced approach fits better to the real world cost-sensitive scenario.
Taking into account the equivalent statements of Theorem 3 in Dimitriadis et al. (2024), our goal is to apply the methodology suggested by Shao et al. (2023) in two directions:
1. It is well-known that the Gini concentration index (to be denoted by G) is related to AUC as follows: G = 2AUC - 1. Our proposal is to use the Leimkhuler curve which, in economics and reliability context, plots the cumulative proportion of total productivity against the cumulative proportion of sourses arranged in decreasing order. The area under the Leimkhuler curve, to be denoted by AUL, satisfies the same relation, i.e., G = 2AUL - 1, consult Burrell (1991) and Balakrishnan et al. (2010). Moreover, the Leimkhuler curve has similar shape as ROC curve: it begins at (0,0) and terminates at (1,1), being non-decreasing and concave.
Hence, the Leimkhuler curve is a cumulative distribution function of a random variable having the Bradford distribution. We will use an analogous procedure as the one suggested by Shao et al. (2023) with respect to the Leimkhuler curve to get the corresponding cost space and compare their results obtained for WAUC procedure.
2. It is well known that the area below the Murphy curve is equal to the half of the mean Brier score and the area below the Brier curve (see Hernándes-Orallo et al. (2011)) is equal to the mean Brier score. Then, using again the methodology of Shao et al. (2023) we will extend the results of Hernándes-Orallo et al. (2011) when the cost distribution is different than the uniform one. Finally, we will show the corresponding weighted versions in cost space of Murphy curves.
References:
Balakrishnan, N., Sarabia, J.M. and Kolev, N. (2010). A simple relation between Leimkhuler curve and the mean residual life. Joutnal of Informetrics 4, 602-607.
Burrell, Q. (1991). The Bradford distribution and the Gini index. Scientometrics 21, 181-194.
Dimitriadis T., Gneiting T. and Jordan, A. I. (2021). Stable reliability diagrams for probabilistic classifiers. Proceedings of the National Academy of Sciences 118, Article e2016191118.
Dimitriadis, T., Gneiting, T., Jordan, A. I. and Vogel, P. (2024). Evaluating probabilistic classifiers: The triptych. International Journal of Forecasting 40, 1101-1122.
Hernández-Orallo J., Flach P. and Ferri C. (2011). Brier curves: A new cost-based visualizationof classifier performance. In: Proceedings of the 28th International Conference on Machine Learning.
Hernández-Orallo J., Flach P. and Ferri C. (2013). ROC curves in cost space. Machine Learning 93, 71-91.
Shao, H, Qianqian X., Zhiyong Y., Peisong W., Peifeng G. and Qingming H. (2023). Weighted ROC curve in cost space: Extending AUC to cost-sensitive learning. In: Advances of Neural Information Processing Systems (NeurIPS 2023) 36, 17257-17368.