
Turnstile
The goal of this gem is to provide near real time tracking and reporting on the number of users currently online and accessing given application. It requires that the reporting layer is able to uniquely identify each user. You may also add one another dimension to the tracking, such as i.e. a platform — a coded device or device type the user is using.
For example, you might support platforms: ios
, android
, macos
, windows
, etc.
The gem uses Redis in order to keep track of the data, and can operate in either the online mode* (tracking users from a Rack Middleware) or offline mode, by taling an application file log file and searching for a particular pattern.
Installation
Add this line to your application's Gemfile:
$ gem install turnstile-rb
$ turnstile --help
Usage
The gem provides command line interface shown below:
Description
Turnstile is a Redis-based library that can accurately track total number
of concurrent users accessing a web/API based server application. It can
break it down by "platform" or a device type, and returns data in JSON,
CSV of NAD formats. While user tracking may happen synchronously using a
Rack middleware, another method is provided that is based on log file
analysis, and can therefore be performed outside web server process.
Usage
# Tail the log file as a proper daemon, and optionally HTTP end point
turnstile -f <file> [ --daemon ] [ --web PORT ] [ options ]
# Add a single item and exit
turnstile -a 'platform:ip:user' [ options ]
# Print the summary stats and exit
turnstile -s [ json | csv | nad ] [ options ]
Details
Turnstile can be run as a daemon, in which case it watches a given log
file. Or, you can run turnstile to print the current aggregated stats in
several supported formats, such as JSON.
When Turnstile is used to tail the log files, ideally you should start
turnstile daemon on each app sever that's generating log file, or be
content with the effects of sampling.
Note that the IP address is not required to track uniqueness. Only
platform and UID are used. Also note that custom formatter can be
specified in a config file to parse arbitrary complex log lines.
For tailing a log files, Turnstile must first match a log line expected to
contain the tokens, and then extract is using one of the matchers. You can
specify which matcher to use depending on whether you can add Turnstile's
tokens to your log or not. If you can, great! If not, implement your own
custom matcher and great again.
The following matchers are available, and can be selected with -F:
-
Format named delimited
, which expects the following token in the
log file:
x-turnstile:platform:ip:user-identifier
Where ':' (the delimiter) can be set via -l option, OR you can use one
of the following formats: json_formatted
, pipe_formatted
,
comma_formatted
, colon_formatted
(the default). The match is
performed on a string x-turnstile
, other log lines are skipped.
-
Format json_delimited
, which expects to find a single-line JSON
Hash, containing keys platform
, ip_address
, and user_id
. The match
is performed on any log line containing string 'ip_address
', other
lines are skipped.
-
Format custom
requires passing an additional flag -c/--config file.rb
, which will be required, and which can define a matcher and
assign it to the Turnstile.config.custom_matcher
config variable.
Custom matcher is any object that responds to two methods:
matches?(line)
— must return true if the line is to be used in token extraction, andtokenlize(line)
— must return a string of the form platform:ip:user-id
as a result of parsing the line
.
Command Line Options
The following flags are available:
Mode of Operation:
-f, --file FILE Starts Turnstile in a file-tailing mode
-s, --show [FORMAT] Print current stats and exit. Optional
format can be "json" (default), "nad",
"yaml", or "csv"
-w, --web [PORT] Starts a Sinatra app on a given port
offering /turnstile/<json|yaml> end point.
Can be used with file tailing mode, or
standalone. Default port is 9090
-a, --add TOKEN Registers an event from the token, such as
"ios:123.4.4.4:32442". Use -l to customize
the delimiter
-p, --print-keys Prints all Turnstile keys in Redis
--flushdb Wipes Redis database, and exit
Tailing log file:
-d, --daemonize Daemonize to watch the logs
-b, --read-backwards [LINES] Like tail, read last LINES lines
instead of tailing from the end
-F, --format FORMAT Log file format (see above)
-l, --delimiter CHAR Forces "delimited" file type, and
uses CHAR as the delimiter
-c, --config FILE Ruby config file that can define the
custom matcher, supporting arbitrary
complex logs
Redis Server:
-r, --redis-url URL Redis server URL
--redis-host HOST Redis server host
--redis-port PORT Redis server port
--redis-db DB Redis server db
--hiredis Use hiredis high performance library
Miscellaneous:
-i, --idle-sleep SECONDS When no work was detected, pause the
threads for several seconds.
-v, --verbose Print status to stdout
-t, --trace Enable trace mode
-h, --help Show this message
To summarize, you can run turnstile
in order to:
- start a daemon to tail a log file
- optionally start a Sinatra web server on port 9090, that returns aggregated results to
/turnstile/json
- to print results
- to print all redis keys
- to reset all data
- to add new data
Data Collection and Reporting
Turnstile contains two primary parts: data collection and reporting.
Data collection may happen in two way:
- Synchronously — in real time — i.e. from a web request
- Or asynchronously — by "tailing" the logs on your servers
Synchronous tracking is accurate, supports sampling, but introduces a run-time dependency into your application middleware stack that might not be desirable. This is how Rack::Attack
and other similar middleware operates. Luckily, there is a better way.
Asynchronous tracking has a slight initial setup overhead and it requires a consistently formatted log file that includes lines with the platform, IP and user ID for ALL users at least some of the time. This method introduces zero run-time overhead, as the data collection happens outside of the web request in a standalone daemon, that simply tails your log files.
On high performance web applications it is highly recommended to employ the asynchronous tracking via the log file, and possibly — a custom matcher.
NOTE: as of Turnstile version 3, you can define a custom matcher in a ruby-syntax configuration file. Coupled with -F custom
you are then able to parse arbitrary complex log files and extract the three tokens needed by Turnstile: the platform
enum type, IP addresss
, and User ID
.
Real Time Tracking API
With Real Time tracking you can use sampling to estimate the number of online users.
- To possibly use sampling, use the
Turnstile::Tracker#track
method. - To store and analyze 100% of your data use
Turnstile::Tracker#add
.
Example:
@tracker = Turnstile::Tracker.new
user_id = 12345
platform = 'desktop'
ip = '224.247.12.4'
@tracker.add(user_id, platform, ip)
@tracker.add_token("ios:172.2.5.3:39898098098")
@tracker.add_token("ios|172.2.5.3|39898098098", '|')
Without any further calls to track()
method for this particular user/platform/ip combination, the user is considered online for 60 seconds. Each subsequent call to track()
resets the TTL.
Offline Log Parsing by "Tailing Logs"
If adding latency to a web request is not desirable, another option is to run Turnstile turnstile
process as a daemon on each application server. The daemon "tails" the production.log
file (or any other file), while scanning for lines matching a particular configurable pattern, and then extracting user id
, IP
and platform
based on another configurable regular expression setting.
Data Structure
The logging approach expects that you print a special token into your log file, which contains three fields separated by a delimiter:
x-turnstile
is a hardcoded string used in finding this token,platform
— ideally a short string, such as 'desktop', 'ios', 'android', etcremote IP
— is the IP address of the requestuser_id
— can be encoded, i.e. digest of user_id)
for example, a token such as this:
x-turnstile:desktop:125.4.5.13:3456
is colon-separated, and easily extractable from the log.
Log File Formats
Turnstile supports two primary formats:
-
JSON format, where each log line contains the following JSON fields:
{ "user_id" :17344742,
"platform" :"iphone",
"session_id" :"4eKMZJ4nggzvkix29zpS",
"ip_address" :"70.210.128.241"
}
-
Plain text format, where lines are space delimited, and the token is one of the fields of your log line, itself delimited using a configurable character.
2017-10-06 11:03:21 x-turnstile|desktop|124.5.4.3|234324 GET /...
You can specify the file format using the -t | --file-type
switch.
Possible values are:
json_formatted
pipe_delimited
colon_delimited
comma_delimited
You can also pass the token delimiter on the command line, -D | --delimiter ","
in which case the delimited
file type is used, with your custom delimiter.
NOTE: Default format is pipe_delimited
.
Custom Log Matchers
This feature is only available in Turnstile version 3.0 or later.
To be able to tail a structured log file in any format, create a ruby config file, and pass it with -c <file>
.
For example, below we'll define a custom matcher that extracts our token from a CSV log file.
# config/custom_csv_matcher.rb
# This matcher extracts platform, UID and IP from the following CSV string:
# 2018-05-02 21:51:44.031,25928,3997,th-M4wDQM4w0,web,j5v-dzg0J,69.181.72.240,e2b1be795372c385c92a7df420752992
class CSVMatcher
def tokenize(line)
words = line.split(',')
platform, ip, uid = [ words[4], web[6], web[7] ]
[platform, ip, uid].join(':')
end
def matches?(line)
line =~ /x-rqst/
end
end
Turnstile.config.custom_matcher = CSVMatcher.new
The above matcher defines a very simple block that receives a string as input, and returns a string which is a combined value from:
"#{platform}:#{ip}:#{user_id}"
For example, ios:1.4.54.25:fg7988798779
is a valid token. Note that IP addresses are not necessary for the purposes of tracking, and can be omitted.
With the above file defined, we would start turnstile's collector process as follows, tailing the CSV log file:
turnstile -f log/production_log.csv -F custom -c config/custom_csv_matcher.rb
Examples
For example:
> gem install turnstile
> turnstile -v -f log/production/log -t json_formatted | \
tee -a /var/log/turnstile.log
> turnstile -v -f log/production/log -h 127.0.0.1 -p 6432 | \
tee -a /var/log/turnstile.log
2014-04-12 05:16:41 -0700: updater:flush - nothing to flush, sleeping 6s..
2014-04-12 05:16:41 -0700: updater:queue - nothing in the queue, sleeping 5s...
2014-04-12 05:16:41 -0700: log-reader - starting to tail file log....
2014-04-12 05:16:46 -0700: updater:queue - nothing in the queue, sleeping 5s...
2014-04-12 05:16:53 -0700: updater:flush - nothing to flush, sleeping 6s..
2014-04-12 05:16:56 -0700: updater:queue - ( 0.65ms) caching [746] keys locally
2014-04-12 05:16:59 -0700: updater:flush - ( 91.73ms) flushing cache with [602] keys
2014-04-12 05:17:05 -0700: updater:flush - nothing to flush, sleeping 6s..
^Ctrl-C
Note that ideally you should run turnstile
on all app servers, for completeness, and because this does not incur any additional cost for the application (as user tracking is happening outside web request).
Reporting
Once the tracking information is sent, the data can be queried.
If you used sampling, then you should query using Turnstile::Observer
class that provides exprapolation of the results based on sample size configuration.
Turnstile::Observer.new.stats
total: 3,
platforms: 2 },
users: [ { uid: 1, platform: 'desktop', ip: '123.2.4.54' }, ... ]
If you did not use sampling, you can get some answers from theTurnstile::Adapter
class:
Turntstile::Adapter.new.fetch
You can also request an aggregate results, suitable for sending to graphing systems or displaying on a dashboard:
Turntstile::Adapter.new.aggregate
Summary Printing
JSON and CSV
Use the following syntax:
turnstile -s json
turnstile -s csv
Circonus NAD
We use Circonus to collect and graph data. You can use turnstile
to dump the current aggregate statistics from redis to standard output,
which is a tab-delimited format consumable by the nad daemon.
(below output is formatted to show tabs as aligned for readability).
> turnstile -s
turnstile.iphone n 383
turnstile.ipad n 34
turnstile.android n 108
turnstile.ipod_touch n 34
turnstile.unknown n 36
turnstile.total n 595
TODO:
- Allow users of the gem to easier customize log reader to fit their own custom log files
- Export configuration into a YAML file and load from there by defaul
- Refactor commands to have a single
turnstile
CLI with sub-commands watch
and report
.
Contributing
- Fork it ( http://github.com//turnstile/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request