Node Resource Interface, Revisited
This project is currently in DRAFT status
Goal
NRI allows plugging domain- or vendor-specific custom logic into OCI-
compatible runtimes. This logic can make controlled changes to containers
or perform extra actions outside the scope of OCI at certain points in a
containers lifecycle. This can be used, for instance, for improved allocation
and management of devices and other container resources.
NRI defines the interfaces and implements the common infrastructure for
enabling such pluggable runtime extensions, NRI plugins. This also keeps
the plugins themselves runtime-agnostic.
The goal is to enable NRI support in the most commonly used OCI runtimes,
containerd and
CRI-O.
Background
The revisited API is a major rewrite of NRI. It changes the scope of NRI
and how it gets integrated into runtimes. It reworks how plugins are
implemented, how they communicate with the runtime, and what kind of
changes they can make to containers.
NRI v0.1.0 used an OCI hook-like one-shot plugin invocation
mechanism where a separate instance of a plugin was spawned for every NRI
event. This instance then used its standard input and output to receive a
request and provide a response, both as JSON data.
Plugins in NRI are daemon-like entities. A single instance of a plugin is
now responsible for handling the full stream of NRI events and requests. A
unix-domain socket is used as the transport for communication. Instead of
JSON requests and responses NRI is defined as a formal, protobuf-based
'NRI plugin protocol' which is compiled into ttRPC bindings. This should
result in improved communication efficiency with lower per-message overhead,
and enable straightforward implementation of stateful NRI plugins.
Components
The NRI implementation consists of a number of components. The core of
these are essential for implementing working end-to-end NRI support in
runtimes. These core components are the actual NRI protocol,
and the NRI runtime adaptation.
Together these establish the model of how a runtime interacts with NRI and
how plugins interact with containers in the runtime through NRI. They also
define under which conditions plugins can make changes to containers and
the extent of these changes.
The rest of the components are the NRI plugin stub library
and some sample NRI plugins. Some plugins
implement useful functionality in real world scenarios. A few
others are useful for debugging. All of the sample plugins
serve as practical examples of how the stub library can be used to implement
NRI plugins.
Protocol, Plugin API
The core of NRI is defined by a protobuf protocol definition
of the low-level plugin API. The API defines two services, Runtime and Plugin.
The Runtime service is the public interface runtimes expose for NRI plugins. All
requests on this interface are initiated by the plugin. The interface provides
functions for
- initiating plugin registration
- requesting unsolicited updates to containers
The Plugin service is the public interface NRI uses to interact with plugins.
All requests on this interface are initiated by NRI/the runtime. The interface
provides functions for
- configuring the plugin
- getting initial list of already existing pods and containers
- hooking the plugin into pod/container lifecycle events
- shutting down the plugin
Plugin Registration
Before a plugin can start receiving and processing container events, it needs
to register itself with NRI. During registration the plugin and NRI perform a
handshake sequence which consists of the following steps:
- the plugin identifies itself to the runtime
- NRI provides plugin-specific configuration data to the plugin
- the plugin subscribes to pod and container lifecycle events of interest
- NRI sends list of existing pods and containers to plugin
- the plugin requests any updates deemed necessary to existing containers
The plugin identifies itself to NRI by a plugin name and a plugin index. The
plugin index is used by NRI to determine in which order the plugin is hooked
into pod and container lifecycle event processing with respect to any other
plugins.
The plugin name is used to pick plugin-specific data to send to the plugin
as configuration. This data is only present if the plugin has been launched
by NRI. If the plugin has been externally started it is expected to acquire
its configuration also by external means. The plugin subscribes to pod and
container lifecycle events of interest in its response to configuration.
As the last step in the registration and handshaking process, NRI sends the
full set of pods and containers known to the runtime. The plugin can request
updates it considers necessary to any of the known containers in response.
Once the handshake sequence is over and the plugin has registered with NRI,
it will start receiving pod and container lifecycle events according to its
subscription.
Pod Data and Available Lifecycle Events
NRI Pod Lifecycle Events
NRI plugins can subscribe to the following pod lifecycle events:
The following pieces of pod metadata are available to plugins in NRI:
- ID
- name
- UID
- namespace
- labels
- annotations
- cgroup parent directory
- runtime handler name
Container Data and Available Lifecycle Events
NRI Container Lifecycle Events
NRI plugins can subscribe to the following container lifecycle events:
- creation (*)
- post-creation
- starting
- post-start
- updating (*)
- post-update
- stopping (*)
- removal
*) Plugins can request adjustment or updates to containers in response to
these events.
The following pieces of container metadata are available to plugins in NRI:
- ID
- pod ID
- name
- state
- labels
- annotations
- command line arguments
- environment variables
- mounts
- OCI hooks
- rlimits
- linux
- namespace IDs
- devices
- resources
- memory
- limit
- reservation
- swap limit
- kernel limit
- kernel TCP limit
- swappiness
- OOM disabled flag
- hierarchical accounting flag
- hugepage limits
- CPU
- shares
- quota
- period
- realtime runtime
- realtime period
- cpuset CPUs
- cpuset memory
- Block I/O class
- RDT class
Apart from data identifying the container, these pieces of information
represent the corresponding data in the container's OCI Spec.
Container Adjustment
During container creation plugins can request changes to the following
container parameters:
- annotations
- mounts
- environment variables
- OCI hooks
- rlimits
- linux
- devices
- resources
- memory
- limit
- reservation
- swap limit
- kernel limit
- kernel TCP limit
- swappiness
- OOM disabled flag
- hierarchical accounting flag
- hugepage limits
- CPU
- shares
- quota
- period
- realtime runtime
- realtime period
- cpuset CPUs
- cpuset memory
- Block I/O class
- RDT class
Container Updates
Once a container has been created plugins can request updates to them.
These updates can be requested in response to another containers creation
request, in response to any containers update request, in response to any
containers stop request, or they can be requested as part of a separate
unsolicited container update request. The following container parameters
can be updated this way:
- resources
- memory
- limit
- reservation
- swap limit
- kernel limit
- kernel TCP limit
- swappiness
- OOM disabled flag
- hierarchical accounting flag
- hugepage limits
- CPU
- shares
- quota
- period
- realtime runtime
- realtime period
- cpuset CPUs
- cpuset memory
- Block I/O class
- RDT class
Runtime Adaptation
The NRI runtime adaptation package is the interface
runtimes use to integrate to NRI and interact with NRI plugins. It
implements basic plugin discovery, startup and configuration. It also
provides the functions necessary to hook NRI plugins into lifecycle
events of pods and containers from the runtime.
The package hides the fact that multiple NRI plugins might be processing
any single pod or container lifecycle event. It takes care of invoking
plugins in the correct order and combining responses by multiple plugins
into a single one. While combining responses, the package detects any
unintentional conflicting changes made by multiple plugins to a single
container and flags such an event as an error to the runtime.
Wrapped OCI Spec Generator
The OCI Spec generator package wraps the
corresponding package
and adds functions for applying NRI container adjustments and updates to
OCI Specs. This package can be used by runtime NRI integration code to
apply NRI responses to containers.
Plugin Stub Library
The plugin stub hides many of the low-level details of implementing an NRI
plugin. It takes care of connection establishment, plugin registration,
configuration, and event subscription. All sample plugins
are implemented using the stub. Any of these can be used as a tutorial on
how the stub library should be used.
Sample Plugins
The following sample plugins exist for NRI:
Please see the documentation of these plugins for further details
about what and how each of these plugins can be used for.
Security Considerations
From a security perspective NRI plugins should be considered part of the
container runtime. NRI does not implement granular access control to the
functionality it offers. Access to NRI is controlled by restricting access
to the systemwide NRI socket. If a process can connect to the NRI socket
and send data, it has access to the full scope of functionality available
via NRI.
In particular this includes
- injection of OCI hooks, which allow for arbitrary execution of processes with the same privilege level as the container runtime
- arbitrary changes to mounts, including new bind-mounts, changes to the proc, sys, mqueue, shm, and tmpfs mounts
- the addition or removal of arbitrary devices
- arbitrary changes to the limits for memory, CPU, block I/O, and RDT resources available, including the ability to deny service by setting limits very low
The same precautions and principles apply to protecting the NRI socket as
to protecting the socket of the runtime itself. Unless it already exists,
NRI itself creates the directory to hold its socket with permissions that
allow access only for the user ID of the runtime process. By default this
limits NRI access to processes running as root (UID 0). Changing the default
socket permissions is strongly advised against. Enabling more permissive
access control to NRI should never be done without fully understanding the
full implications and potential consequences to container security.
Plugins as Kubernetes DaemonSets
When the runtime manages pods and containers in a Kubernetes cluster, it
is convenient to deploy and manage NRI plugins using Kubernetes DaemonSets.
Among other things, this requires bind-mounting the NRI socket into the
filesystem of a privileged container running the plugin. Similar precautions
apply and the same care should be taken for protecting the NRI socket and
NRI plugins as for the kubelet DeviceManager socket and Kubernetes Device
Plugins.
The cluster configuration should make sure that unauthorized users cannot
bind-mount host directories and create privileged containers which gain
access to these sockets and can act as NRI or Device Plugins. See the
related documentation
and best practices
about Kubernetes security.
API Stability
NRI APIs should not be considered stable yet. We try to avoid unnecessarily
breaking APIs, especially the Stub API which plugins use to interact with NRI.
However, before NRI reaches a stable 1.0.0 release, this is only best effort
and cannot be guaranteed. Meanwhile we do our best to document any API breaking
changes for each release in the release notes.
The current target for a stable v1 API through a 1.0.0 release is the end of
this year.
Project details
nri is a containerd sub-project, licensed under the Apache 2.0 license.
As a containerd sub-project, you will find the:
information in our containerd/project
repository.