Glowworm
Glowworm allows you to do gradual rollouts of new features, and "dark
deploys" -- rolling out code for a feature, then only turning it on
selectively and after the code is in place everywhere.
We call a given feature/account combination a "feature flag". Your
apps need to know the value of that flag frequently and reliably. But
you need to change the value fairly quickly when it's time to roll the
new feature out.
Glowworm can make that happen.
The name is inspired by dark deploys. First, the code crawls into
place. When you're ready, it all lights up!
Installing
gem install glowworm
or use a Gemfile with Bundler.
Ooyalans should make sure that "gems.sv2" is listed as a gem source in
your Gemfile or on your gem command line.
Overview
At Ooyala? Want a visual step-by-step version of "how do I use
Glowworm for a new feature?" We have an online slide deck for that,
updated from a Glowworm presentation at Ooyala.
"http://portal.sliderocket.com/BKHPY/Glowworm"
If you're not at Ooyala you won't have the nice web interface for
setting up features, account sets and so on. But the workflow and
concepts are all the same.
Usage
You need to specify an account name or number and a feature name.
Glowworm handles it from there.
if Glowworm.feature_flag("bob's account #", "turn_button_blue?")
# Code to turn the button blue
else
# Code for the default red button
end
You can also prefetch features if you want to:
Glowworm.prefetch(:all, :all, :timeout => 3.5)
Technically you can supply a feature name or account to prefetch,
but the current version of Glowworm ignores that and just checks
the server for all updates.
Glowworm features default to false, but begin returning true as you
turn them on. You can specify a non-boolean value as the default as a
way of determining if the value is "really" false, or if we're simply
returning the default.
Glowworm.feature_flag("EG-172434", "video_speed", :default => :the_default)
In each case where Glowworm can return false, the default will be
returned instead if you specify one.
Lifecycle
When you first start querying a new feature, Glowworm will always
return false, or your default if you've set one. If the feature
or account isn't in the database, false is the initial default in
all cases.
You'll need to add the account to the database, add the feature to the
database, and turn on that feature for that account set. You can see
an example of code to do this in glowworm/server/example_test_data.rb
in the Glowworm gem code.
Once that has happened, Glowworm should begin returning true for
that feature.
You can also, instead, add an override for that combination of account
and feature. That's not a particularly scalable way to turn on a
feature for a large number of accounts in a system with many accounts,
but it's fine for testing. You can see an example of adding an
override in example_test_data.rb as well.
Options
You can query or prefetch with a TTL or a timeout. The TTL specifies
how long before Glowworm queries the server about that feature again.
The timeout specifies how long to wait for a server result before just
returning a (possibly stale) cached result. 0 is a perfectly good
timeout or TTL if that's what you need in a given case.
# Don't trust cached values, make sure to query the server
Glowworm.feature_flag("12434", "myfeature", :ttl => 0.0)
# Don't wait for a result, give me a stale value but update in the background
Glowworm.feature_flag("9999", "someFeature", :timeout => 0.0)
# Don't wait for a result, give me a stale value and don't update
Glowworm.feature_flag("9999", "someFeature", :timeout => 0.0, :ttl => 1_000_000)
Caching
Glowworm caches locally in memory.
Glowworm always queries all accounts and all features from the server
initially. Then it just exchanges a checksum with the server to find
out when the data has changed. As soon as the timestamp goes stale,
the server sends the new information to the client.
Configuring with an Ecology
Glowworm supports a JSON configuration file managed by the Ecology
gem. By default it checks the location of the current executable ($0)
with extension .ecology. So "bob.rb" would have "bob.ecology" next to
it.
Whatever application is using Glowworm will need an Ecology (or to set
variables explicitly in Ruby) to specify where the Glowworm server is.
The app can also give options for things like timeout and ttl.
An Ecology file has this structure:
{
"application": "MyApp",
"features": {
"server": "glowworm.ooyala.com:4999",
"ttl": "30",
"timeout": "1000"
},
"logging": {
"console_out": false,
"default_component": "MyLibrary"
}
}
Every part is optional, including the presence of the file at all.
The example above includes extra configuration for termite, another
Ecology-enabled gem, to show how they combine.
The server property gives the hostname and port of the Glowworm
feature server. If none is specified, glowworm defaults to port 4999
on localhost. Note that if specifying a server, the port must also
be specified.
TTL, if present, gives the number of seconds that a given value is
considered fresh in the cache. After that time it will be updated.
This defaults to 5 minutes (300 seconds). Until that time, the cached
result will be returned. "Refresh" is an outdated name for the same
setting.
Timeout, if present, gives the number of milliseconds to wait when
querying the server for the correct answer to return. Even if this
fails the cache will be updated later after the request returns.
EventMachine
If you are using glowworm with eventmachine, or in general would not like a background thread,
then you have a couple of options. In an eventmachine architecture, it is required that your app
use em-synchrony (or sinatra-synchrony), as well as em-net-http. These are not included in glowworm
to avoid inclusion of the whole eventmachine stack in the gem. You can require "glowworm/em" to use
the version which will make em-friendly http calls. This, along with require "glowworm/no_bg" use a
different version of glowworm that synchronously fetches all data at require time, and otherwise whenever
Glowworm.update_cache_in_foreground is called. Because of this, you need to be sure your glowworm server
is properly set before using these requires. One has the additional option of requiring "glowworm", and
then later in initialization calling Glowworm.no_bg, for non-em apps, or Glowworm.em for eventmachine apps.
Note that this call to Glowworm.em must be from within a fiber, as it uses EM.synchrony, and could cause your
app to hang if called from outside eventmachine itself.
Servers
An example Glowworm server used by Ooyala is included in the "server"
directory of the Glowworm gem. The protocol is very simple and you
should have an easy time implementing a Glowworm server if ours is
inappropriate for your use case.
Our server is nginx serving data from a Sequel-based daemon with an ecology file to
configure it.
To run it, cd into glowworm/server and run bundle exec ./glowworm_server.rb
You will also need an nginx server serving /opt/ooyala/glowworm/shared/www/ on port 4999.
You can find the required config file in glowworm/server/config/nginx.conf
For basic test data, run ./example_test_data --clear
from the same
directory.
Servers at Ooyala, For Production Use
If you're using Glowworm at Ooyala for production features then
there's more infrastructure to help out.
You can create and set flags in production using the Support Tool.
Start at the URL
http://support-tools/features
and add or click through to the feature you want. If you can see features
but can't add or change them, you'll need to talk to the Tools and
Automation team about getting permissions.
You can also create and destroy account sets, add and remove accounts to
them and set particular features active for particular account sets. See
http://support-tools/account_sets.
If you want to do the same things in staging rather than in
production, use the hostname support-tools-staging rather than
support-tools.
Example Application
In the example subdirectory of the gem you can find a very simple
Sinatra application that auto-refreshes every five seconds, queries a
feature on a 20-second TTL and displays a button whose text varies
according to the feature.
First, start your glowworm server (see above). Then start the example
app server (cd glowworm/example
, run bundle exec ./example_server.rb
) and then from the server directory run
example_test_data --clear --new-signup
and you should see the button
text change within 25 seconds. Run it with just --clear
, and you
should see it change back within another 25 seconds. You can go back
and forth as often as you have the patience and the browser should
keep changing.
Updating Features and Account Sets
The supplied Glowworm server uses a simple set of database tables, and
includes migrations to set them up. The idea is that you have account
sets, with a table of accounts in the account sets
(account_set_accounts). You also have a table of which features each
account set is true for (account_set_features). Finally, you have a
table of the features themselves, both to supply names for them and to
mark each feature fully active (i.e. active for all non-overridden
accounts). Fully active is only supported in Glowworm versions 0.2.0 and up.
Reliability
In the real world, bad things happen. Sometimes your packets can't
reach the glowworm server. Sometimes it's down. Sometimes you have
only very old cached data. Sometimes you have no cached data at all.
Sometimes the glowworm server is down when your app restarts, so it
can't load data on startup.
So what's the worst case here, and how does Glowworm respond?
If Glowworm has already gotten started and the server goes down,
Glowworm will simply continue returning the same information it last
saw. When the server comes back up, Glowworm's next poll will return
better information and fresh information will be provided to the
application. No problem.
If the server is down when Glowworm starts, everything will return
false. Glowworm will keep trying to query, but until its first
successful response from the server, everything is false in all cases.
Even "fully active" features still assume that Glowworm can find out
about that from the server, so these features return false also.
Order of Precedence to determine a Feature's Value
There are two main scenarios in which a feature's value must be determined,
being if data has been received from the server or not.
In the case that we have no data from the server:
- The default set for the Glowworm client (app- or call-level) will be returned.
- If none is set, false will be returned.
If the server has been contacted and the caches have been populated:
- If an account override is present, it's value will be returned.
- If an account set has a value set for this feature, it's value will be returned.
- If an app- or call-level default has been set, it's will be returned.
- If the feature has a value set in the "fully_active" field, it's value will be returned.
- If none of these are present, false will be returned.
Rate Limits and Scaling
Glowworm updates are expensive -- all features, all feature sets, all
overrides and all providers are sent, though not each combination of
them. However, they're also rare. One update is sent to each client
when it starts up, and then an update is sent whenever the data
actually changes on the server. That's comparatively rare.
Normally each Glowworm client sends back a checksum from the last
successful update. If nothing has changed, the server sends back a
304 (unchanged) and no further response. The Glowworm client
considers itself fully up to date, and doesn't poll again until the
TTL has expired.
This makes short TTLs and frequent polling fairly cheap - they require
a single HTTP exchange with almost no data. However, short timeouts
are also fine since you don't have to wait for an update to be sure
it's happening.
Design
For its server, database representation and wire protocol, Glowworm
uses account sets - groups of accounts for which a given feature will
normally be toggled. It also has individual per-account-and-feature
override flags, so you don't have to strictly stick with those groups.
The account sets are an optimization - we can send which account set
each account belongs to, and what accounts sets a given feature is
active for, which lets us send far less data than the full
accounts-times-features matrix. It's also an excellent user interface
convention since frequently the same accounts will tend to want the
earliest, least stable features and want them soonest.
If you only set overrides for all your features and don't use account
sets then you will get much worse performance than Glowworm is
designed for. Account sets are a significant optimization, not just a
convenience.