Ruby Reduce
Ruby Reduce is a small library designed to run reduce commands on a Rails 3.x log without needing to setup a hadoop cluster to do it. So it is basically designed to be a small scale reduce function. The original idea was to make the gem plugable so that you could plugin different inputers to chunk non Rails 3 log files. Also there was supposed to be a similar plugin design for the output but that has not happened yet. So currently it only accepts rails 3 files for input and writes the results to MongoDB.
Installation
gem install ruby_reduce
Usage
The usage is also very straight forward and demostraited by this code snippet.
require 'date'
require 'ruby_reduce'
module RubyReduce
def self.mongo_connection
end
def self.mongo_db
return 'test_db'
end
end
r = RubyReduce.reduce({
:input => '../production.log',
:map => Proc.new do |key, value|
processed_by = /Processing by (\w*#\w*)/.match(value)
unless processed_by.nil?
emit processed_by[1].gsub!("#", "_"), value
end
end,
:reduce => Proc.new do |log_statement|
date = /Started [GET|POST|DELETE|PUT|HEAD].* at (.*)/.match(log_statement)
unless date.nil?
processing_time = /Completed \d* .* in (\d*)/.match(log_statement)[1]
emit({'date' => Time.parse(date[1]), 'processed_time' => processing_time})
end
end,
:output => 'graph_data'
})
Once all of the key value pairs have been reduced the results will be collected by key and written into mongodb with one document for each key emited in the map function. The id (_id) of this document is the emited key
Limitations
Right now this is at best alpha software. There are no tests and while I use it for my on projects it is not really been on a wide range of problems. Your feedback is welcome.
Questions
Contact me at joshsmoore@gmail.com with question, comments, or just that you are using the library and want me to continue work on it.