CrossValidation
This gem provides a k-fold cross-validation routine and confusion matrix
for evaluating machine learning classifiers. See below for
usage or jump to the
documentation.
Installation
Add this line to your application's Gemfile:
gem 'cross_validation'
And then execute:
$ bundle install --binstubs .bin
Or install it yourself as:
$ gem install cross_validation
Usage
To cross-validate your classifier, you need to configure a run as
follows:
require 'cross_validation'
runner = CrossValidation::Runner.create do |r|
r.documents = my_array_of_documents
r.folds = 10
r.classifier = lambda { SpamClassifier.new }
r.fetch_sample_class = lambda { |sample| sample.klass }
r.fetch_sample_value = lambda { |sample| sample.value }
r.matrix = CrossValidation::ConfusionMatrix.new(method(:keys_for))
r.training = lambda { |classifier, doc|
classifier.train doc.klass, doc.value
}
r.classifying = lambda { |classifier, doc|
classifier.classify doc
}
end
With the run configured, just invoke #run
to return a confusion matrix:
mat = runner.run
With a confusion matrix in hand, you can compute many statistics about
your classifier:
mat.accuracy
mat.f1
mat.fscore(beta)
mat.precision
mat.recall
Please see the
respective documentation
for each method for more details.
Defining keys_for
The ConfusionMatrix class requires a keys_for
Proc
that returns a
symbol. In this method, you specify what constitutes a true positive
(:tp
), true negative (:tn
), false positive (:fp
), and false
negative (:fn
). For example, in spam classification, you can construct
the following table to write the keys_for method:
actual
+---------------------------------
expected | correct | not correct
----------+----------------+----------------
spam | true positive | false positive
ham | true negative | false negative
You can then implement this table with nested hashes or just a few
conditionals:
def keys_for(expected, actual)
if expected == :spam
actual == :spam ? :tp : :fp
elsif expected == :ham
actual == :ham ? :tn : :fn
end
end
Once you have your keys_for
method implemented, pass it into the
ConfusionMatrix with method(:keys_for)
, or if it's a class-method,
MyClass.method(:keys_for)
. (You can also implement the method as a
lambda.)
Roadmap
For v1.0:
- Implement configurable, parallel cross-validation
- Include more complete examples
Author
Jon-Michael Deldin, dev@jmdeldin.com