DSUtils.jl
The intent of DSUtils is to make it easy for industry practitioners to get their work done. It provides functions for some routine parts of data science workflow so that people can focus on problem solving and not break their train of thought. The functions should just work and be reasonably performant on millions of rows of data.
There are no new capabilities per se, indeed, many of the functionalities are available in other packages. It is an end user package for people to get their analysis done rather than a package for others to build packages with.
Installation
This package is not registered in Julia's general registry yet. You can add it via its URL:
] add https://github.com/DaymondLing/DSUtils.jl
Current capabilities
Binary classifier performance evaluation
kstest
, 2 sample Kolmogorov-Smirnov separation point estimate and locationauroc
, Area Under Receiver Operating Characteristics curve via concordance calculation rather than numeric integrationbcdiag
, wrapper forkstest
andauroc
that facilitates plotting the graphs and generating tables belowksplot
, plot of Kolmogorov-Smirnov 2 sample CDF separationrocplot
, ROC plotbiasplot
, plot of actual response rate vs. predicted probabilityaccuracyplot
, plot of accuracy given utility values for [TP, FN, FP, TN]liftcurve
, actual and predicted lift curvescumliftcurve
, cumulative actual and predicted lift curvesliftable
, actual and predicted lift tablescumliftable
, cumulative actual and predicted lift tablesinfovalue
, change in two frequency distributions
Binning
ranks
, equal density binning, tied values are always in the same bin
Dummy encoding
onehot!
, one hot encode a categorical variable, works with missing level and missing data, output indicator variables are in a DataFrame with named columns
Missing computation
sumxm
, sum a collection of scalars or vectors (element-wise) treating missing as 0
Project Status
This is work in progress, development is done on Julia 1.5 although there's nothing special about the code that would prevent it from working on older versions of Julia.