Boss Ingest Client
A Python command line application for performing distributed ingest of data into the Boss
Overview
The ingest client application lets users move data from local storage into the Boss, quickly and reliably. It supports Python 3.6 and later. It uses a JSON configuration file to define ingest jobs, and a plugin system to support any local file organization.
Two types of ingest are supported:
Tile Based Ingests
This ingest type uploads data in 2D image tiles. It supports many different formats, but it is not as efficient as volumetric ingests.
Volumetric Ingests
This type of ingest uploads data in the Boss' native storage format. 3D cuboids, that are 512, 512, 16, in x, y, z, respectively, are uploaded by the ingest client.
Installation
-
Use virtualenv to isolate the ingest client from your system Python installation
virtualenv ingest-env
. ingest-env/bin/activate
mkvirtualenv ingest-env
-
Install the ingest client
pip install boss-ingest
If you get errors installing Pillow, it is most likely because you do not have all of Pillow's dependencies installed. Check out the "Installing Pillow Dependencies" section below for help.
Configuring Credentials
You must provide the ingest client with your Boss API token so it can make authenticated requests on your behalf.
Also remember that you must have write permissions to the resource (collection
, experiment
, and channel
) where data is to be written, as specified in the ingest job configuration file. If you created the resources you will automatically have access.
There are three ways to provide your API token to the ingest client. The ingest client will try to use the first token it finds in the following order:
-
Via command line arguments
- You can directly pass your token to the ingest client when starting it from the command line. See the Usage section below.
-
Via the intern environment variables
- The ingest client can also reuse environment variables used to configure intern to set your API token
export INTERN_TOKEN=<you_token_here>
-
Via the intern configuration file
- If you have already installed intern and added your API token to its configuration file, the ingest client will automatically load the token
Usage
The ingest client is installed as a system script and can be called from the command line directly as boss-ingest
.
An ingest job is the act of uploading a dataset or sub-region of a dataset to the Boss. You do not need to upload an entire dataset at once if desired, and can specify in both space and time what data is to be written.
There are three primary operations you can perform with the ingest client: Create, Join, and Cancel an ingest job
Plugins
To handle the many different ways users can organize and store data, "plugins" are used to perform two operations. The first (Path Processor) is responsible for taking user specified parameters and tile indices provided from the upload task queue to generate an absolute file path to the correct data file associated with the image tile. The second (Tile Processor) is responsible for taking user specified parameters, tile indices, and generated file path to generate a file handle containing the image data. This handle is then used to upload the image tile.
The ingest client wiki on GitHub provides more detailed information on on how to create plugins and which plugins come pre-installed.
If you develop your own plugins, you simply need to make sure they are on your PYTHONPATH
before calling boss-ingest
export PYTHONPATH=$PYTHONPATH:/<path_to_modules>
Installing Pillow Dependencies
The ingest client uses Pillow to handle image files. There are several dependencies you may need to install before you can run pip install Pillow
. Pillow is installed automatically when you run pip install boss-ingest
, so these external dependencies must already be installed for that command to successfully complete.
-
OSX
Install jpeg and tiff libraries using Homebrew
brew install libjpeg
brew install libtiff
Sometimes you may also need to install zlib development packages from XCode
xcode-select --install
-
Linux (Ubuntu)
sudo apt-get install libjpeg-dev libtiff5-dev zlib1g-dev libfreetype6-dev liblcms2-dev libopenjpeg-dev
-
Windows - Untested
Installation for Development
-
mkdir
and cd
to a directory of your choice
-
Clone the ingest client
git clone https://github.com/jhuapl-boss/ingest-client.git
-
Use virtualenv to isolate the ingest client from your system Python installation
virtualenv ingest-env
. ingest-env/bin/activate
mkvirtualenv ingest-env
-
Install Python dependencies
cd ./ingest-client
pip install -r requirements.txt
-
An additional token configuration method via the token.json file is available if you've cloned the ingest-client repository locally
-
Create a token.json file in the root directory of the repo
vi ./token.json
-
Get your API token. This can be done by visiting the Boss Management Console. After logging in, click on your username in the top right corner, then "API Token".
-
Copy your API token to the token.json file that looks like this:
{
"token": "<insert_token_here>",
"host": "api.theboss.io"
}
Testing
The nose2 library is used to run unit tests. From the ingest-client
directory simply invoke nose2.
nose2
We use continuous integration to automatically run tests as well. Future work will expand on testing and add more complex integration testing.
Legal
Use or redistribution of the Boss system in source and/or binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code or binary forms must adhere to the terms and conditions of any applicable software licenses.
- End-user documentation or notices, whether included as part of a redistribution or disseminated as part of a legal or scientific disclosure (e.g. publication) or advertisement, must include the following acknowledgement: The Boss software system was designed and developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL).
- The names "The Boss", "JHU/APL", "Johns Hopkins University", "Applied Physics Laboratory", "MICrONS", or "IARPA" must not be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact BossAdmin@jhuapl.edu.
- This source code and library is distributed in the hope that it will be useful, but is provided without any warranty of any kind.
License
If not otherwise marked, all code in this repository falls under the license granted in LICENSE.md.