ROCKS.BCDiagType
BCDiag

A structure of diagnostic properties of a Binary Classifier, facilitates summary plots and tables.

source
ROCKS.accuracyplotMethod
accuracyplot(x::BCDiag; util=[1, 0, 0, 1])

Using util values for [TP, FN, FP, TN], produce accuracy plot and its [max, argmax, argdep].
Default util values of [1, 0, 0, 1] gives the standard accuracy value of (TP+TN)/N.

source
ROCKS.bcdiagMethod
bcdiag(target, pred; groups = 100, rev = true, tie = 1e-6)

Perform diagnostics of a binary classifier.
target is a 2 level categorical variable, pred is probability of class 1.
groups is the number of bins to use for plotting/printing.
rev = true orders pred from high to low.
tie is the tolerance of pred where values are considered tied.
Returns a BCDiag struct which can be used for plotting or printing:

  • biasplot is calibration plot of target response rate vs. pred response rate
  • ksplot produces ksplot of cumulative distributions
  • rocplot plots the Receiver Operating Characteristics curve
  • accuracyplot plots the accuracy curve with adjustable utility
  • liftcurve is the lift curve
  • cumliftsurve is the cumulative lift surve
  • liftable is the lift table as a DataFrame
  • cumliftable is the cumulative lift table as a DataFrame
source
ROCKS.biasplotMethod
biasplot(x::BCDiag)

return bias calibration plot of x - actual response vs. predicted response

source
ROCKS.concordanceFunction
concordance(class, var, tie)

Computes concordant, tied and discordant pairs.
class can be either a BitVector or a 2 level categorical target variable in which case true is defined by the last value in sorted sequence.
var is a Vector of predictor, same length as class,
tie (optional) can be a number (default is 1e-6) that defines a tied region, or it can be a function that when called with a scalar value will return a tuple of lower bound and upper bound of a tied region, useful when you want to do percentage tied region for instance.

Pair-wise comparison between class 1 with class 0 values are made as follows:

  • class 1 value > class 0 value is Concordant
  • class 1 value ≈ class 0 value (within tie) is Tied
  • class 1 value < class 0 value is Discordant

Returns:

  • concordant, number of concordant comparisons
  • tied, number of tied comparisons
  • discordant, number of discordant comparisons
  • auroc, or C, is (Concordant + 0.5Tied) / Total comparisons; same as numeric integration of ROC curve
  • gini, 2C-1, also known as Somer's D, is (Concordant - Discordant) / Total comparisons

Concordance calculation is the same as numeric integration of the ROC curve, but it allows for fuzzy tied regions which can be useful.

Note:

  • Goodman-Kruskal Gamma is (Concordant - Discordant) / (Concordant + Discordant)
  • Kendall's Tau is (Concordant - Discordant) / (0.5 x Total count x (Total count - 1))
source
ROCKS.cumliftcurveMethod
cumliftcurve(x::BCDiag)

return cumulative lift curve plot of x - cumulative actual and predicted vs. depth

source
ROCKS.ksplotMethod
ksplot(x::BCDiag)

return KS plot of x - CDF1 (True Positive) and CDF0 (False Positive) versus depth

source
ROCKS.kstestMethod
kstest(class, var; rev = true)

Calculate empirical 2 sample Kolmogorov-Smirnov statistic and its location.
class is a 2 level categorical variable, var is the distribution to analyze.

Returns:

  • n, total number of observations
  • n1, number of observations of class 1
  • n0, number of observations of class 0
  • baserate, incidence rate of class 1
  • ks, the maximum separation between the two cumulative distributions
  • ksarg, the value of var at which maximum separation is achieved
  • ksdep, depth of ksarg in the sorted values of var

rev = true counts depth from high value towards low value

source
ROCKS.liftcurveMethod
liftcurve(x::BCDiag)

return lift curve plot of x - actual and predicted versus depth

source
ROCKS.ranksMethod
ranks(x; groups = 10, rank = tiedrank, rev = false)

Return a variable which bins x into groups number of bins.
The rank keyword allows different ranking method;
use rev = true to reverse sort so that small bin number is large value of x.
Missing values are assigned to group missing.

Default values of rank = tiedrank and rev = false results in similar grouping as SAS PROC RANK groups=n tied=mean.

source
ROCKS.rocplotMethod
rocplot(x::BCDiag)

return ROC plot of x - CDF1 (True Positive) vs. CDF0 (False Positive)

source