Consider a two-class prediction problem (binary classification), in which the outcomes are labeled either as positive or negative. There are four possible outcomes from a binary classifier:
True Positive(TP): the prediction is positive and the actual value is positive.
True Negative(TN): the prediction is negative and the actual value is negative.
False Positive(FP): the prediction is positive and the actual value is negative.
False Negative(FN): the prediction is negative and the actual value is positive.
Precision measures the accuracy of positive predictions. It is the ratio of true positive predictions to the total positive predictions (true positives + false positives). High precision indicates that fewer false positives are present.
Precision=TP+FPTP
Recall (also known as sensitivity or true positive rate) measures the ability of a model to identify all relevant instances. It is the ratio of true positive predictions to the actual number of positive cases (true positives + false negatives). High recall means the model successfully captured most of the actual positives.
The ROC curve {(x,y)} can be parameterized by y=TPR(T) and x=FPR(T).
Given a threshold parameter T, the instance is identified as positive if X>T and negative otherwise. X follows a probability density function f1(x) if the instance actually belongs to the class positive, and f0(x) otherwise. Therefore, the true positive rate is given by TPR(T)=∫T∞f1(x)dx and the false positive rate is given by
FPR(T)=∫T∞f0(x)dx.
Suppose the numbers of positive and negative samples in a batch are M and N, respectively.
AUC=M⋅N∑I(Ppos,Pneg)
where I is a characteristic function:
I(p,q)=⎩⎨⎧1,0.5,0,p>qp=qp<q
The time complexity is O(MN).
Denote by ri the rank score of i-th sample, i.e., for xj with the smallest prediction score, rj=0 and for xk with the largest prediction score, rk=M+N−1.
For each positive sample, its rank is precisely the same amount of contribution towards the count of pairs where positive sample is picked over negative sample (because of higher score).
Therefore, one can first sum up all ranks from positive samples, then subtract those cases where two positive samples are paired. Note that for the highest ranked positive sample, there are M−1 positive samples below it, corresponding to a subtraction of M−1. A simple deduction leads to the final subtraction is
(M−1)+⋯+1+0=2(M−1)M.
Finally we have
AUC=M⋅N∑i∈Positiveri−2(M−1)M
The time complexity is O((M+N)log(M+N)), which is noticeably better than O(MN) when M and N are large.
Now we calculate with an example:
real label110100pred score0.90.70.60.550.20.1rank543210