Release 1.1.1 [2013-02-11 13:14]
Release 1.1.0 [2011-05-08]
- Switched to Fileutils for 1.9 compatibility
= Synopsis
Download, merge and convert Amazon S3 bucket log files for a specified date or date range.
- Download S3 bucket log files produced by Amazon S3. Log files are downloaded once and cached locally.
- Merge those log files together into a single logfile per bucket (sorting on ascending timestamp)
- Convert the log file from Amazon Server Access Log Format to Apache Common Log Format
Ralf is an acronym for Retrieve Amazon Log Files. Ralf does the following things:
= Usage
Usage: ./bin/ralf [options]
Download and merge Amazon S3 bucket log files for a specified date range and
output a Common Log File. Ralf is an acronym for Retrieve Amazon Log Files.
Ralf downloads bucket log files to local cache directories, merges the Amazon Log
Files and converts them to Common Log Format.
Example: ./bin/ralf --range month --now yesterday --output-file '/var/log/amazon/:year/:month/:bucket.log'
AWS credentials (Access Key Id and Secret Access Key) are required to access
S3 buckets. For security reasons these credentials can only be specified in a
configuration file (see --config-file) or through the environment using the
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
Log selection options:
-l, --[no-]list List buckets that have logging enabled. Does not process log files.
-b, --buckets x,y,z Buckets for which to process log files. Defaults to all log-enabled buckets.
-r, --range BEGIN[,END] Date or date range to process. Defaults to 'today'.
-t, --now TIME Date to use as base for range. Defaults to 'today'.
You can use Chronic expressions for '--range' and '--now'. See http://chronic.rubyforge.org.
Example: --range 'last week'
All days of previous week.
Example: --range 'this week'
Beginning of this week (sunday) upto and including today.
Example: --range '2010-01-01','2010-04-30'
First four months of this year.
Example: --range 'this month' --now yesterday
This will select log files from the beginning of yesterday's month upto and including yesterday.
The --buckets, --range and --now options are optional. If unspecified, (incomplete)
logging for today will be processed for all buckets (that have logging enabled).
This is equivalent to specifying "--range 'today'" and "--now 'today'".
Output options:
-o, --output-file FORMAT Output file, e.g. '/var/log/s3/:year/:month/:bucket.log'. Required.
The --output-file format uses the last day of the range specified by (--range)
to determine the filename. E.g. when the format contains ':year/:month/:day' and
the range is 2010-01-15..2010-02-14, then the output file will be '2010/02/14'.
-x, --cache-dir FORMAT Directory name(s) in which to cache downloaded log files. Optional.
The --cache-dir format expands to as many directory names as needed for the
range specified by --range. E.g. "/var/run/s3_cache/:year/:month/:day/:bucket"
expands to 31 directories for range 2010-01-01..2010-01-31.
Defaults to '~/.ralf/:bucket' or '/var/log/ralf/:bucket' (when running as root).
Config file options:
-c, --config-file [FILE] Path to file with configuration settings (in YAML format).
Configuration settings are read from the (-c) specified configuration file
or from ~/.ralf.conf or from /etc/ralf.conf (when running as root).
Command-line options override settings read from the configuration file.
The configuration file must be in YAML format. Each command-line options has an
equivalent setting in a configuration file replacing dash (-) by underscore(_).
The Amazon Access Key Id and Secret Access Key can only be specified in the
Example:
output_file: /var/log/amazon_s3/:year:month/:bucket.log
aws_access_key_id: my_access_key_id
aws_secret_access_key: my_secret_access_key
To only use command-line options simply specify -c or --config-file without
an argument.
Debug options:
-d, --[no-]debug [aws] Show debug messages.
Common options:
-h, --help Show this message.
-v, --version Show version.
= Library
You can also use Ralf from within your own ruby code. Each command-line option
has a corresponding option in the options has passed to Ralf.new and Ralf.run.
Replace a dash (-) by an underscore (_) in the names:
options = { :output_file => '/var/log/s3/:bucket.log' }
require 'rubygems'
require 'ralf'
r = Ralf.new({ :config_file => '/Users/me/ralf.yaml' }.merge(options))
r.run
Or run it in one go:
Ralf.run({ :config_file => '/Users/me/ralf.yaml' }.merge(options))
= Requirements
- Credentials for an Amazon S3 account
- Enable logging on S3
You can use Cyberduck[http://cyberduck.ch/] for example.
= Gem dependencies
Ralf depends on the following gems which will automatically installed when you
install the ralf gem.
= Authors
Authors: {Leon Berenschot}[http://github.com/LeipeLeon] and {K.J. Wierenga}[http://github.com/kjwierenga]
Contributers: {Victor Castell}[http://github.com/victorcoder]
This program is used for {kerkdienstgemist.nl}[http://kerkdienstgemist.nl] {Amazon S3}[http://aws.amazon.com/s3/] log file processing.