#+TITLE: LogSense Readme - Monitor your Rails app easy and fast
#+AUTHOR: Adolfo Villafiorita
#+STARTUP: showall
LogSense generates reports and statistics from Ruby on Rails and Apache/Nginx
log files.
Main features:
- Statistics for Rails app in production and Web server logs (combined format,
which can be produced both by Apache and Nginx)
- Reports on performances, errors, visitors, and devices used to access your
websites and webapps[fn:: LogSense parses also the data generated by the
BrowserInfo gem, providing additional information for Rails apps, including
devices, platforms and number of accesses to methods by device type.].
- Can combine one or more log files
- No need for cookies or other tracking technologies (but you need access to
your log files)
- Filters allow to analyze specific periods distinguish traffic generated by
self polls and crawlers.
- Reports can be generated in HTML, txt, ufw, and SQLite. HTML reports are
responsive and come with dark and light theme.
LogSense is Written in Ruby, it runs from the command line, it is fast, and it
can be installed on any system with a relatively recent version of Ruby. We
use it with Ruby 3.1.4 and 3.3.0.
It is fast. On a ThinkPad P16, a 277M log file is parsed in 15 seconds,
processing, that is, about 7740 events per second; a 569M log file is parsed in
50 seconds, that is, about 4700 events per second.
** Rails Production Report
#+ATTR_HTML: :width 80%
[[file:./screenshots/rails-screenshot.png]]
LogSense understands the Rails production log and generates the following
reports in TXT and HTML:
- Daily Distribution
- Time Distribution
- Statuses
- Statuses by Day
- Rails Performance
- Controller and Methods by Device
- Fatal Events
- Fatal Events
- Fatal Events (grouped by type)
- Job Error
- Job Errors (grouped)
- Browsers
- Platforms
- IPs
- Countries
- IP per hour
- Sessions
** Apache/Nginx Report
#+ATTR_HTML: :width 80%
[[file:./screenshots/combined_log-screenshot.png]]
LogSense reads the Apache/Nginx combined log format and generates the
following reports in TXT and HTML:
- Time Distribution
- 20_ and 30_ on HTML pages
- 20_ and 30_ on other resources
- 40_ and 50_x on HTML pages
- 40_ and 50_ on other resources
- 40_ and 50_x on HTML pages by IP
- 40_ and 50_ on other resources by IP
- Statuses
- Statuses by Day
- Browsers
- Platforms
- IPs
- Countries
- IP per hour
- Combined Platform Data
- Referers
- Sessions
** UFW Report
The =ufw= output format generates directives for Uncomplicated Firewall,
blacklisting IPs requesting URLs matching a given pattern.
We use it to blacklist IPs requesting WordPress login pages on our
websites... since we don't use WordPress for our websites.
Example
#+begin_src
$ log_sense -f apache -t ufw -i apache.log
ufw deny from 20.212.3.206
/wp-login.php /wordpress/wp-login.php /blog/wp-login.php /wp/wp-login.php
ufw deny from 185.255.134.18
...
#+end_src
-
Installation
#+begin_src bash
gem install log_sense
#+end_src
If you want to collect information about browsers, platform and devices when
generating Rails reports, add the =browser= gem to your bundle and the
following code to =application_controller.rb=:
#+begin_example ruby
Gemfile
gem "browser"
#+end_example
#+begin_example ruby
application_controller.rb
class ApplicationController < ActionController::Base
[...]
before_action do |controller|
user_agent = request.env['HTTP_USER_AGENT']
ip = request.env['REMOTE_ADDR']
hashed_ip = Digest::SHA256.hexdigest ip
b = Browser.new(user_agent)
now = DateTime.now
logger = Rails.logger
browser_data = [
b.name, b.platform, b.device.name,
controller.class.name, controller.action_name,
request.format.symbol,
hashed_ip,
now
]
browser_data_str = browser_data.map { |x| "\"#{x}\"" }.join(',')
logger.info "BrowserInfo: #{browser_data_str}"
end
[...]
end
#+end_example
-
Usage
#+begin_src bash :results raw output :wrap example :exports both
log_sense --help
#+end_src
#+RESULTS:
#+begin_example
Usage: log_sense [options] [logfile ...]
--title=TITLE Title to use in the report
-f, --input-format=FORMAT Log format (stored in log or sqlite3): rails or apache (DEFAULT: apache)
-i, --input-files=file,file, Input file(s), log file or sqlite3 (can also be passed as arguments)
-t, --output-format=FORMAT Output format: html, txt, sqlite, ufw (DEFAULT: html)
-o, --output-file=OUTPUT_FILE Output file. (DEFAULT: STDOUT)
-b, --begin=DATE Consider only entries after or on DATE
-e, --end=DATE Consider only entries before or on DATE
-l, --limit=N Limit to the N most requested resources (DEFAULT: 100)
-w, --width=WIDTH Maximum width of long columns in textual reports
-r, --rows=ROWS Maximum number of rows for columns with multiple entries in textual reports
-p, --pattern=PATTERN Pattern to use with ufw report to select IP to blacklist (DEFAULT: php)
-c, --crawlers=POLICY Decide what to do with crawlers (applies to Apache Logs)
--no-selfpoll Ignore self poll entries (requests from ::1; applies to Apache Logs) (DEFAULT: false)
--no-geo Do not geolocate entries (DEFAULT: true)
--verbose Inform about progress (output to STDERR) (DEFAULT: false)
-v, --version Prints version information
-h, --help Prints this help
This is version 2.0.0
Output formats:
- rails: txt, html, sqlite3, ufw
- apache: txt, html, sqlite3, ufw
#+end_example
Examples:
#+begin_example sh
log_sense -f apache -i access.log -t txt > access-data.txt
log_sense -f rails -i production.log -t html -o performance.html
#+end_example
LogSense focuses on privacy, data-ownership, and simplicity: no need to
install JavaScript snippets, no tracking cookies, just plain and simple log
analysis.
LogSense is also inspired by static websites generators: statistics are
generated from the command line and accessed as static HTML files. This
significantly reduces the attack surface of your web server and installation
headaches. We have a cron job running on our servers, generating statistics at
night. The generated files are then made available on a private area on the
web and rotated monthly.
- An important word of warning on SQLite3 output
[[https://owasp.org/www-community/attacks/Log_Injection][Log poisoning]] is a technique whereby attackers send requests with invalidated
user input to forge log entries or inject malicious content into the logs.
log_sense sanitizes entries of HTML reports, to try and protect from log
poisoning. Log entries and URLs in SQLite3 tables, however, are not
sanitized: they are read and stored from the log as they are. This is not, in
general, an issue, unless you use the unsanitized data from SQLite as it is in
environments where URL can be opened or code executed using the URLs as
argument.
See the [[file:CHANGELOG.org][CHANGELOG]] file.
LogSense should run on any system on which a recent version of Ruby
runs. We tested it with Ruby 2.6.9 and Ruby 3.0.x, and Ruby 3.3.x
[[https://shair.tech][Shair.Tech]]
The code implements a pipeline, with the following steps:
- Parser: parses a log to a SQLite3 database. The database
contains a table with a list of events, and, in the case of Rails
report, a table with the errors.
- Aggregator: takes as input a SQLite DB and aggregates data,
typically performing "group by", which are simpler to generate in
Ruby, rather than in SQL. The module outputs a Hash, with
different reporting data.
- GeoLocator: add country information to all the reporting data
which has an IP as one the fields.
- Shaper: makes (geolocated) aggregated data (e.g. Hashes and
such), into Array of Arrays, simplifying the structure of the code
building the reports.
- Emitter generates reports from shaped data using ERB.
See [[todo.org]]
We have been running LogSense for quite a few years with no particular issues.
There are no known bugs; there is an unknown number of unknown bugs.
You are most welcome to report issues and missing features, using the Issue
tracker.
LogSense is distributed under the terms of the [[http://opensource.org/licenses/MIT][MIT License]].
Geolocation is made possible by [[https://db-ip.com/][dbip]]'s IP to City database, released under a
CC license.
The world map is distributed under the terms of the [[http://opensource.org/licenses/MIT][MIT License]] by Pareto
Softare, [[https://simplemaps.com/][Simplemaps.com]]. It is used in LogSense with some changes to the class
names and ids.