Supervised Learning
Supervised learning is the machine learning task of inferring a function from labeled training data. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Credits for some of the algorithms used go to Andrew Ng at Stanford University.
Example
One example is the prediction of house prices (output value) along two dimensions (features): the size of house in square meters and the number of bedrooms.
The training data could look something like this:
Size (m2) | # bedrooms | Price |
---|
2104 | 3 | 399900 |
1600 | 3 | 329900 |
3000 | 4 | 539900 |
1940 | 4 | 239999 |
Using linear regression, we can now predict the price of a house with 2200 square meters and 3 bedrooms.
Installation
Add this line to your application's Gemfile:
gem 'supervised_learning'
And then execute:
$ bundle
Or install it yourself as:
$ gem install supervised_learning
Usage
1. Create a matrix of the training data
The output value (the type of value you want to predict) needs to be in the last column. The matrix has to have a) at least two columns (one feature and one output) and b) at least one row. The more data you feed it, the more accurate the prediction.
Consult the Ruby API for information on how to build matrices. You can use both arrays of rows or columns to create a matrix.
require 'matrix'
training_set = Matrix[ [2104, 3, 399900], [1600, 3, 329900], [3000, 4, 539900], [1940, 4, 239999] ]
2. Instantiate an object with the training data
require 'supervised_learning'
program = SupervisedLearning::LinearRegression.new(training_set)
3. Create a prediction in form of a matrix
This matrix has one row and the columns follow the order of the training set. It has one column less than the training set since the output value (the last column of the training set) is the value we want to predict.
prediction_set = Matrix[ [2200, 3] ]
program.predict(prediction_set)
=> 454115.66
4. Options
The lower bound running time of the #predict
method is cubic / Ω(n3). Hence, on very large datasets (more than 1,000 columns / features), the #predict
method can be slow. If this happens, use the #predict_advanced
method, which uses gradient descent (approximation) instead of normalization (direct calculation). Gradient descent can be a bit less accurate and also requires some optional parameters.
The #predict_advanced
method takes 4 parameters in the following order:
- The prediction_set, similar to the
#prediction
method discussed above - The learning rate (how big of a "jump" each iteration makes). The smaller the learning rate, the less errors are to be expected. Unfortunately, a smaller learning rate requires more iterations (see below). Start at 0.001 and increase the parameter to 0.01, 0.1, 1, etc if the calculation is too slow.
- The number of iterations. The more iterations, the more accurate the result will but the slower the algorithm will run.
- Whether you want print out values to see at which point convergence happens.
prediction_set = Matrix[ [2200, 3] ]
program.predict_advanced(prediction_set, 0.01, 600, false)
=> 443693.16
Contributing
- Fork it ( https://github.com/[my-github-username]/supervised_learning/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request