ROCKS.BCDiag
ROCKS.accuracyplot
ROCKS.bcdiag
ROCKS.biasplot
ROCKS.concordance
ROCKS.cumliftable
ROCKS.cumliftcurve
ROCKS.ksplot
ROCKS.kstest
ROCKS.liftable
ROCKS.liftcurve
ROCKS.ranks
ROCKS.rocplot
ROCKS.BCDiag
— TypeBCDiag
A structure of diagnostic properties of a Binary Classifier, facilitates summary plots and tables.
ROCKS.accuracyplot
— Methodaccuracyplot(x::BCDiag; util=[1, 0, 0, 1])
Using util
values for [TP, FN, FP, TN], produce accuracy plot and its [max, argmax, argdep].
Default util
values of [1, 0, 0, 1] gives the standard accuracy value of (TP+TN)/N.
ROCKS.bcdiag
— Methodbcdiag(target, pred; groups = 100, rev = true, tie = 1e-6)
Perform diagnostics of a binary classifier.target
is a 2 level categorical variable, pred
is probability of class 1.groups
is the number of bins to use for plotting/printing.rev
= true orders pred
from high to low.tie
is the tolerance of pred
where values are considered tied.
Returns a BCDiag struct which can be used for plotting or printing:
biasplot
is calibration plot oftarget
response rate vs.pred
response rateksplot
produces ksplot of cumulative distributionsrocplot
plots the Receiver Operating Characteristics curveaccuracyplot
plots the accuracy curve with adjustable utilityliftcurve
is the lift curvecumliftsurve
is the cumulative lift surveliftable
is the lift table as a DataFramecumliftable
is the cumulative lift table as a DataFrame
ROCKS.biasplot
— Methodbiasplot(x::BCDiag)
return bias calibration plot of x
- actual response vs. predicted response
ROCKS.concordance
— Functionconcordance(class, var, tie)
Computes concordant, tied and discordant pairs.class
can be either a BitVector or a 2 level categorical target variable in which case true
is defined by the last value in sorted sequence.var
is a Vector of predictor, same length as class
,tie
(optional) can be a number (default is 1e-6) that defines a tied region, or it can be a function that when called with a scalar value will return a tuple of lower bound and upper bound of a tied region, useful when you want to do percentage tied region for instance.
Pair-wise comparison between class 1 with class 0 values are made as follows:
- class 1 value > class 0 value is Concordant
- class 1 value ≈ class 0 value (within
tie
) is Tied - class 1 value < class 0 value is Discordant
Returns:
- concordant, number of concordant comparisons
- tied, number of tied comparisons
- discordant, number of discordant comparisons
- auroc, or C, is (Concordant + 0.5Tied) / Total comparisons; same as numeric integration of ROC curve
- gini, 2C-1, also known as Somer's D, is (Concordant - Discordant) / Total comparisons
Concordance calculation is the same as numeric integration of the ROC curve, but it allows for fuzzy tied regions which can be useful.
Note:
- Goodman-Kruskal Gamma is (Concordant - Discordant) / (Concordant + Discordant)
- Kendall's Tau is (Concordant - Discordant) / (0.5 x Total count x (Total count - 1))
ROCKS.cumliftable
— Methodcumliftable(x::BCDiag)
return cumulative lift table of x
as a DataFrame
ROCKS.cumliftcurve
— Methodcumliftcurve(x::BCDiag)
return cumulative lift curve plot of x
- cumulative actual and predicted vs. depth
ROCKS.ksplot
— Methodksplot(x::BCDiag)
return KS plot of x
- CDF1 (True Positive) and CDF0 (False Positive) versus depth
ROCKS.kstest
— Methodkstest(class, var; rev = true)
Calculate empirical 2 sample Kolmogorov-Smirnov statistic and its location.class
is a 2 level categorical variable, var
is the distribution to analyze.
Returns:
- n, total number of observations
- n1, number of observations of class 1
- n0, number of observations of class 0
- baserate, incidence rate of class 1
- ks, the maximum separation between the two cumulative distributions
- ksarg, the value of
var
at which maximum separation is achieved - ksdep, depth of ksarg in the sorted values of
var
rev
= true counts depth from high value towards low value
ROCKS.liftable
— Methodliftable(x::BCDiag)
return lift table of x
as a DataFrame
ROCKS.liftcurve
— Methodliftcurve(x::BCDiag)
return lift curve plot of x
- actual and predicted versus depth
ROCKS.ranks
— Methodranks(x; groups = 10, rank = tiedrank, rev = false)
Return a variable which bins x
into groups
number of bins.
The rank
keyword allows different ranking method;
use rev = true
to reverse sort so that small bin number is large value of x
.
Missing values are assigned to group missing
.
Default values of rank = tiedrank
and rev = false
results in similar grouping as SAS PROC RANK groups=n tied=mean.
ROCKS.rocplot
— Methodrocplot(x::BCDiag)
return ROC plot of x
- CDF1 (True Positive) vs. CDF0 (False Positive)