NCS Navigator MDES Warehouse
MDES Warehouse is the central collation and reporting engine for the
NCS Navigator suite. It provides the infrastructure to support
transforming data from the operational and data-collection systems and
converting it into the format required for reporting to the National
Children's Study program office.
While it has been built for the NCS Navigator suite, it exposes
generic attachment points for transformers, potentially allowing it to
be used with other sets of transactional systems for the National
Children's Study.
System Requirements
MDES Warehouse is installed as a ruby gem. It's compatible with Ruby
1.8.7 (including REE) and 1.9.2. It relies on PostgreSQL 9
(or later).
Set up
Install the gem
$ gem install ncs_mdes_warehouse
This will install the gem, all its dependencies, and the mdes-wh
executable.
Set the environment
The warehouse detects the kind of environment it's running in
(production, staging, or development) via the NCS_NAVIGATOR_ENV
environment variable. Arrange for this variable to be set to an
appropriate value for the server where you're deploying the warehouse:
# In, e.g., /etc/bashrc
export NCS_NAVIGATOR_ENV=staging
If you're deploying on a workstation (i.e., for development) you can
skip this step -- the default environment is development.
Configure the NCS Navigator suite
If you haven't already, create /etc/nubic/ncs/navigator.ini
by
following the example in the ncs_navigator_configuration
gem.
Create the PostgreSQL databases
The warehouse requires two PostgreSQL databases. There are several
ways to create them. createdb
is one:
$ createuser -e -h dbserver.my.org -PRSD mdes_warehouse
Enter password for new role:
Enter it again:
CREATE ROLE mdes_warehouse PASSWORD 'md5...' NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;
$ createdb -e -O mdes_warehouse mdes_warehouse_working
CREATE DATABASE mdes_warehouse_working OWNER mdes_warehouse;
$ createdb -e -O mdes_warehouse mdes_warehouse_reporting
CREATE DATABASE mdes_warehouse_reporting OWNER mdes_warehouse;
Additional database roles
For safe reporting purposes, the warehouse automatically creates
PII-screened views in a schema named no_pii
. Since those views are
dropped along with everything else when the warehouse schema is
recreated, direct grants from user accounts to to the views will be
lost every time ETL runs. To work around this potential problem, the
schema creation process looks for a role named
mdes_warehouse_no_pii
. If this role exists, it will be granted
read-only access to these PII-screened views. Here's how you can
create the role:
$ createuser -e -h dbserver.my.org -LRSD mdes_warehouse_no_pii
CREATE ROLE mdes_warehouse_no_pii NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT NOLOGIN;
Individual data analyst accounts can then be granted this role to have
access to a PII-free view of the warehouse:
# In a database console
GRANT mdes_warehouse_no_pii TO some_data_analyst;
# Following is optional; it will make sure that some_data_analyst
# will default to querying the PII-free views instead of the
# (inaccessible) full-data tables in the public schema.
ALTER some_data_analyst SET search_path=no_pii;
Configure bcdatabase
Like the rest of the NCS Navigator suite, MDES Warehouse uses
bcdatabase to locate database access parameters and
credentials. Bcdatabase looks for credentials under
/etc/nubic/db
. The specific file that the warehouse looks in will
depend on the environment in which you're running:
- development:
local_postgresql.yml
- staging:
ncsdb_staging.yml
- production:
ncsdb_prod.yml
Create or update the appropriate file with the warehouse configuration
for your environment. The below is an example; yours may differ. The
MDES Warehouse-specifc parts are the datamapper_adapter
configuration key and the two mdes_warehouse_*
entries.
defaults:
host: ncsdb-staging.my.org
adapter: postgresql
datamapper_adapter: postgres
mdes_warehouse_working:
username: mdes_warehouse
password: whatever-i-entered-before
mdes_warehouse_reporting:
username: mdes_warehouse
password: whatever-i-entered-before
Configure warehouse
The warehouse takes its base configuration from a file under the
directory /etc/nubic/ncs/warehouse
. The name of the file should be
${NCS_NAVIGATOR_ENV}.rb
; e.g., for the production environment, the
full path would be /etc/nubic/ncs/warehouse/production.rb
.
The minimum contents of this file is a list of transformers, which are
components that translate data from some source system or systems:
c.add_transformer VdrXml.from_most_recent_file(c, Dir['/var/lib/ncs/mdes/COOK*.xml'])
c.add_transformer StaffPortal.create_transformer(c)
The warehouse includes some transformers. Others are included with the
NCS Navigator suite applications themselves. See below for more
information (TODO).
The warehouse configuration file may optionally include other
overrides to MDES warehouse defaults. See the
{file:sample_configuration.rb} for details.
Verify your settings
You can verify most of these settings by running this command:
$ mdes-wh create-schema
It will connect to the working database and create a schema matching
the current MDES version.
TODO: a more complete verification utility.
Use
MDES Warehouse provides an executable named mdes-wh
which has
several subcommands. To get a list of the commands, use mdes-wh help
.
Monitoring
The warehouse produces interactive monitoring output on standard error
for most of its actions. This output can be suppressed using the
--quiet
command line option, or by setting c.output_level = :quiet
in the environment config file.
Separately, warehouse actions are more permanently logged in a log
file. The log file name defaults to
/var/log/ncs/warehouse/{env_name}.log
and can be changed in the
environment configuration file. This log file is never cleaned out
automatically; consider using logrotate or a similar tool to
rotate it on a daily or weekly basis.
If a log can't be written to (e.g., because the directory doesn't
exist), the warehouse will dump warnings and errors to standard out
instead. The log file will contain more detail than the messages
dumped to standard out, so don't rely on this backup behavior in
production.