Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic on ShortScience.org

www.aaai.org
sci-hub
scholar.google.com

Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic
Yan, Lian and Dodier, Robert H. and Mozer, Michael and Wolniewicz, Richard H.
International Conference on Machine Learning - 2003 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Prateek Gupta 5 years ago

In binary classification task on an imbalanced dataset, we often report *area under the curve* (AUC) of *receiver operating characteristic* (ROC) as the classifier's ability to distinguish two classes.
If there are $k$ errors, accuracy will be the same irrespective of how those $k$ errors are made i.e. misclassification of positive samples or misclassification of negative samples. 
AUC-ROC is a metric that treats these misclassifications asymmetrically, making it an appropriate statistic for classification tasks on imbalanced datasets. 

However, until this paper, AUC-ROC was hard to quantify and differentiate to gradient-descent over. 
This paper approximated AUC-ROC by a Wilcoxon-Mann-Whitney statistic which counts the "number of wins" in all the pairwise comparisons -
$
U = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n}I(x_i, x_j)}{mn},
$
where $m$ is the total number of positive samples, $n$ is the number of negative samples, and $I(x_i, x_j)$ is $1$ if $x_i$ is ranked higher than $x_j$. 
Figure 1 in the paper shows the variance of this statistic with an increasing imbalance in the dataset, justifying the close correspondence with AUC-ROC.

Further, to make this metric smooth and differentiable, the step function of pairwise comparison is replaced by sigmoid or hinge functions. 
Further extensions are made to apply this to multi-class classification tasks and focus on top-K predictions i.e. optimize lower-left part of AUC.

In binary classification task on an imbalanced dataset, we often report area under the curve (AUC) of receiver operating characteristic (ROC) as the classifier's ability to distinguish two classes. If there are $k$ errors, accuracy will be the same irrespective of how those $k$ errors are made i.e. misclassification of positive samples or misclassification of negative samples. AUC-ROC is a metric that treats these misclassifications asymmetrically, making it an appropriate statistic for classification tasks on imbalanced datasets.

However, until this paper, AUC-ROC was hard to quantify and differentiate to gradient-descent over. This paper approximated AUC-ROC by a Wilcoxon-Mann-Whitney statistic which counts the "number of wins" in all the pairwise comparisons - $ U = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n}I(x_i, x_j)}{mn}, $ where $m$ is the total number of positive samples, $n$ is the number of negative samples, and $I(x_i, x_j)$ is $1$ if $x_i$ is ranked higher than $x_j$. Figure 1 in the paper shows the variance of this statistic with an increasing imbalance in the dataset, justifying the close correspondence with AUC-ROC.

Further, to make this metric smooth and differentiable, the step function of pairwise comparison is replaced by sigmoid or hinge functions. Further extensions are made to apply this to multi-class classification tasks and focus on top-K predictions i.e. optimize lower-left part of AUC.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private