Security News
vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU
vlt introduced its new package manager and a serverless registry this week, innovating in a space where npm has stagnated.
@stencila/dockter
Advanced tools
Docker is a good tool for creating reproducible computing environments. But creating truely reproducible Docker images can be difficult. Dockter aims to make it easier for researchers to create Docker images for their research projects. Dockter automatically creates and manages a Docker image for your project based on your source source code.
🦄 Features that are not yet implemented are indicated by unicorn emoji. Usually they have a link next to them, like this 🦄 #2, indicating the relevent issue where you can help make the feature a reality. It's readme driven development with calls to action to chase after mythical vaporware creatures! So hip.
Dockter scans your project folder and builds a Docker image for it. If the the folder already has a Dockerfile, then Dockter will build the image from that. If not, Dockter will scan the files in the folder, generate a .Dockerfile
and build the image from that. How Dockter builds the image from your source code depends on the language.
If the folder contains a R DESCRIPTION
file then Docker will build an image with the R packages listed under Imports
installed. e.g.
Package: myrproject
Version: 1.0.0
Date: 2017-10-01
Imports:
ggplot2
The Package
and Version
fields are required in a DESCRIPTION
file. The Date
field is used to define which CRAN snapshot to use. MRAN daily snapshots began 2014-09-08 so the date should be on or after that.
If the folder does not contain a DESCRIPTION
file then Dockter will scan all the R files (files with the extension .R
or .Rmd
) in the folder for package import or usage statements, like library(package)
and package::function()
, and create a .DESCRIPTION
file for you.
If the folder contains a main.R
file, Dockter will set that to be the default script to run in any container created from the image.
If the folder contains a 🦄 #3 requirements.txt
file, or a 🦄 #4 Pipfile
, Dockter will copy it into the Docker image and use pip
to install the specified packages.
If the folder does not contain either of those files then Dockter will 🦄 #5 scan all the folder's .py
files for import
statements and create a .requirements.txt
file for you.
If the folder contains a 🦄 #7 package.json
file, Dockter will copy it into the Docker image and use npm
to install the specified packages.
If the folder does not contain a package.json
file, Dockter will 🦄 #8 scan all the folder's .js
files for import
or require
statements and create a .package.json
file for you.
If the folder contains any 🦄 #9 .ipynb
files, Dockter will scan the code cells in those files for any Python import
or R library
statements and extract a list of package dependencies. It will also 🦄 #10 add Jupyter to the built Docker image.
Docker layered filesystem has advantages but it can cause real delays when you are updating your project dependencies. For example, see this issue for the workarounds used by Node.js developers to prevent long waits when they update their package.json
. The reason this happens is that when you update a requirements file Docker throws away all the susequent layers, including the one where you install all your package dependencies.
Here's a simple motivating example of a Dockerized Python project. It's got a pip
requirements.txt
file which specifies that the project requires pandas
which, to ensure reproducibility, is pinned to version 0.23.0
,
pandas==0.23.0
The project has also got a Dockerfile
that specifies which Python version we want, copies requirements.txt
into the image, and installs the packages:
FROM python:3.7.0
COPY requirements.xt .
RUN pip install -r requirements.txt
You can build a Docker image for that project using Docker,
docker build .
Docker will download the base Python image (if you don't yet have it), and download five packages (pandas
and it's four dependencies) and install them. This took over 9 minutes when we ran it.
Now, let's say that we want to do some plotting in our library, so we add matplotlib
as a dependency in requirements.txt
,
pandas==0.23.0
matplotlib==3.0.0
When we do docker build .
again Docker notices that the requirements.txt
file has changed and so throws away that layer and all subsequant ones. This means that it will download and install all the necesary packages again, including the ones that we previously installed - and takes longer than the first install. For a more contrieved illustration of this, simply add a space to a line in the requirements.txt
file and notice how the package install gets repeated all over again.
Now, let's add a special # dockter
comment to the Dockerfile before the COPY
directive,
FROM python:3.7.0
# dockter
COPY requirements.xt .
RUN pip install -r requirements.txt
The comment is ignored by Docker but tells dockter
to run all subsequent directives and commit them into a single layer,
dockter build .
🔧 Finish description of commit-based approach and illustrate speed up over normal Docker builds
Dockter has been built to expose a JSON-LD API so that it works with other tools. It will parse a Dockerfile into a JSON-LD SoftwareSourceCode
node extracting meta-data about the Dockerfile and build it into a SoftwareEnvironment
node with links to the source files and the build image.
🔧 Illustrate how this is done for all project sources including non Dockerfiles
🔧 Replace this JSON-LD with final version
{
"@context": "https://schema.stenci.la",
"type": "SoftwareSourceCode",
"id": "https://hub.docker.com/#sha256:27d6e441706e89dac442c69c3565fc261b9830dd313963cb5488ba418afa3d02",
"author": [],
"text": "FROM busybox\nLABEL description=\"Prints the current date and time at UTC, to the nearest second, in ISO-8601 format\" \\\n author=\"Nokome Bentley <nokome@stenci.la>\"\nCMD date -u -Iseconds\n",
"programmingLanguage": "Dockerfile",
"messages": [],
"description": "Prints the current date and time at UTC, to the nearest second, in ISO-8601 format"
}
Dockter is designed to make it easier to get started creating Docker images for your project. But it's also designed not to get in your way or restrict you from using bare Docker. You can easily and individually override any of the steps that Dockter takes to build an image.
Code analysis: To stop Dockter doing code analysis and take over specifying your project's package dependencies, just remove the leading '.' from the .DESCRIPTION
, .requirements.txt
or .package.json
file that Dockter generates.
Dockerfile generation: Dockter aims to generate readable Dockerfiles that conform to best practices. They're a good place to start learning how to write your own Dockerfiles. To stop Dockter generating a .Dockerfile
, and start editing it yourself, just rename it to Dockerfile
.
Image build: Dockter manage builds use a special comment in the Dockerfile
, so you can stop using Dockter alltogether and build the same image using Docker (it will just take longer if you change you project dependencies).
Dockter is available as pre-compiled, standalone command line tool, or as a Node.js package.
If you don't have Node.js or would simply prefer a standalone binary, you can download the latest release from the releases page.
npm install @stencila/dockter
You will need to install Docker if you don't already have it on your system.
This package is primarily designed to be used a compiler service within a Stencila deployment (e.g. stencila/cloud
). But you can also use it standalone via the API or command line interface.
The command line interface (CLI) is a good way to get an understanding of what this package does. Essentially, it just exposes the compiler API on the command line.
The most basic thing that this package does is to read a Dockerfile
, parse it to extract metadata, build a Docker image for it, and run that image as a Docker container.
Here's a very simple example Dockerfile
. It uses the tiny busybox
image as a base, adds some meta-data about the image, and then specifies the command to run to print out the date.
FROM busybox
LABEL description="Returns the current date and time at UTC, to the nearest second, in ISO-8601 format" \
author="Nokome Bentley <nokome@stenci.la>"
CMD date -u -Iseconds
You can use the compile
command to a Dockerfile like this into a JSON-LD SoftwareEnvironment
node,
dockter compile Dockerfile > environ.jsonld
{
"@context": "https://schema.stenci.la",
"type": "SoftwareSourceCode",
"id": "https://hub.docker.com/#sha256:27d6e441706e89dac442c69c3565fc261b9830dd313963cb5488ba418afa3d02",
"author": [],
"text": "FROM busybox\nLABEL description=\"Prints the current date and time at UTC, to the nearest second, in ISO-8601 format\" \\\n author=\"Nokome Bentley <nokome@stenci.la>\"\nCMD date -u -Iseconds\n",
"programmingLanguage": "Dockerfile",
"messages": [],
"description": "Prints the current date and time at UTC, to the nearest second, in ISO-8601 format"
}
🔧 Replace this JSON output when a more final version available.
The default CLI output format is JSON but you can get YAML, which is easier to read, by using the --format=yaml
option. You can turn off building of the Docker image (to just extract meta-data) using --build=false
. Use dockter compile --help
for more help.
dockter execute environ.jsonld
dockter execute Dockerfile
The Express router provides PUT /compile
and PUT /execute
endpoints (which do the same thing as the corresponding CLI commands). You can serve them using,
npm start
Or, during development using,
npm run server
A minimal example of how to integrate the router into your own Express server,
const app = require('express')()
const { docker } = require('@stencila/dockter')
const app = express()
app.use('/docker', docker)
app.listen(3000)
Dockter implements a compiler design pattern. Source files are parsed into a SoftwareEnvironment
instance (the equivalent of an AST (Abstract Syntax Tree) in other programming language compilers) which is then used to generate a Dockerfile
which is then built into a Docker image.
The parser classes e.g. PythonParser
, RParser
scan for relevant source files and generate SoftwareEnvironment
instances.
The generator classes e.g. PythonGenerator
, RGenerator
generates a Dockerfile
for a given SoftwareEnvironment
.
DockerGenerator
is a super-generator which combines the other generators.
DockerBuilder
class builds
DockerCompiler
links all of these together.
For example, if a folder has single file in it code.py
, PythonParser
will parse that file and create a SoftwareEnvironment
instance, which DockerGenerator
uses to generate a Dockerfile
, which DockerBuilder
uses to build a Docker image.
We 💕 contributions! To get started,
git clone https://github.com/stencila/dockter
cd dockter
npm install
To run the CLI during development use, npm run cli -- <args>
e.g.
npm run cli -- compile tests/fixtures/dockerfile-date/Dockerfile
This uses ts-node
to compile and run Typescript on the fly so that you don't need to do a build step first.
Then take a look at the docs (online or inline) and start hacking! Please check that your changes pass linting and unit tests,
npm run lint # or, make lint
npm test # or, make text
Use npm test -- <test file path>
to run a single test file
You can setup a Git pre-commit hook to perform these checks automatically before each commit using make hooks
.
Check that any changes you've made are covered 🏅 by unit tests using,
npm run cover # or, make cover
open coverage/lcov-report/index.html
If you've been working on in-code documentation 🙏 you can check that by building and viewing the docs,
npm run docs # or, make docs
open docs/index.html
Please use conventional changelog style commit messages e.g. docs(readme): fixed spelling mistake
. This help with automated semantic versioning. To make this easier, Commitzen is a development dependency and can be used via npm
or make
:
npm run commit # or, make commit
Linting, test coverage, binary builds, package builds, and documentation generation are done on each push on Travis CI. semantic-release
is enabled to automate version management, Github releases and NPM package publishing.
Related Stencila packages include:
stencila/tunix
: compiles JSON-LD SoftwareEnvironment
nodes to NixOS environmentsstencila/kubex
: executes JSON-LD SoftwareEnvironment
nodes on Kubernetes clustersThere are several projects that create Docker images from source code and/or requirements files:
alibaba/derrick
jupyter/repro2docker
Gueils/whales
o2r-project/containerit
openshift/source-to-image
ViDA-NYU/reprozip
Dockter is similar to repro2docker
, containerit
, and reprozip
in that it is aimed at researchers doing data analysis (and supports R) whereas most other tools are aimed at software developers (and don't support R). Dockter differs to these projects principally in that by default (but optionally) it installs the necessary Stencila language packages so that the image can talk to Stencila client interfaces an provide code execution services. Like repro2docker
it allows for multi-language images but has the additional features of package dependency analysis of source code, managed builds and generated of image meta-data.
Why is this a Node.js package?
We've implemented this as a Node.js package for easier integration into Stencila's Node.js based desktop and cloud deployments.
Dockter was inspired by similar tools for researchers including binder
and repro2docker
. It relies on dockerode
, docker-file-parser
, and of course Docker.
FAQs
A Docker image builder for researchers
The npm package @stencila/dockter receives a total of 12 weekly downloads. As such, @stencila/dockter popularity was classified as not popular.
We found that @stencila/dockter demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt introduced its new package manager and a serverless registry this week, innovating in a space where npm has stagnated.
Security News
Research
The Socket Research Team uncovered a malicious Python package typosquatting the popular 'fabric' SSH library, silently exfiltrating AWS credentials from unsuspecting developers.
Security News
At its inaugural meeting, the JSR Working Group outlined plans for an open governance model and a roadmap to enhance JavaScript package management.