Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
jupyterhub-idle-culler
Advanced tools
jupyterhub-idle-culler
provides a JupyterHub service to identify and stop idle
or long-running Jupyter servers via JupyterHub. It works solely by interacting
with JupyterHub's REST API, and is often configured to run as a JupyterHub
managed service started up by JupyterHub itself.
Setup involves three parts:
pip install jupyterhub-idle-culler
Prior to JupyterHub 2.0, the jupyterhub-idle-culler
required full administrative privileges,
in order to have sufficient permissions to stop servers on behalf of users.
JupyterHub 2.0 introduces scopes to allow for more fine-grained permission control. This means that the configured culler service does not need full administrative privileges anymore. It can be assigned only the permissions it needs.
jupyterhub-idle-culler
requires the following scopes to function:
list:users
- to access to the user list API, our source of information about who to cullread:users:activity
- to read the users' last_activity
fieldread:servers
- to read the users' servers
fielddelete:servers
- to stop users' servers, and delete named servers if --remove-named-servers
is passedadmin:users
(optional) - to delete users if --cull-users
is passedTo assign the service the appropriate permissions, declare a role in your jupyterhub_config.py
:
c.JupyterHub.load_roles = [
{
"name": "jupyterhub-idle-culler-role",
"scopes": [
"list:users",
"read:users:activity",
"read:servers",
"delete:servers",
# "admin:users", # if using --cull-users
],
# assignment of role's permissions to:
"services": ["jupyterhub-idle-culler-service"],
}
]
In jupyterhub_config.py
, add the following dictionary for the idle-culler
service to the c.JupyterHub.services
list:
c.JupyterHub.services = [
{
"name": "jupyterhub-idle-culler-service",
"command": [
sys.executable,
"-m", "jupyterhub_idle_culler",
"--timeout=3600",
],
# "admin": True,
}
]
where:
"command"
indicates that the Service will be managed by the Hub, and"admin": True
grants admin permissions to this Service and is only meant for
use with jupyterhub < 2.0; see [above][permissions].jupyterhub-idle-culler
can also be run as a standalone script. It can
access the hub's api with a service token.
Register the service token with JupyterHub in jupyterhub_config.py
:
c.JupyterHub.services = [
{
"name": "jupyterhub-idle-culler-service",
"api_token": "...",
# "admin": True,
}
]
where:
"api_token"
contains a secret token, e.g. generated by openssl rand -hex 32
, and"admin": True
grants admin permissions to this Service and is only meant for
use with jupyterhub < 2.0; see [above][permissions].and store the same token in a JUPYTERHUB_API_TOKEN
environment variable.
Then start jupyterhub-idle-culler
manually.
export JUPYTERHUB_API_TOKEN=api_token_above...
python3 -m jupyterhub_idle_culler [--timeout=900] [--url=http://localhost:8081/hub/api]
--api-page-size Number of users to request per page, when
using JupyterHub 2.0's paginated user list
API. Default: user the server-side default
configured page size. (default 0)
--concurrency Limit the number of concurrent requests made
to the Hub. Deleting a lot of users at the
same time can slow down the Hub, so limit
the number of API requests we have
outstanding at any given time. (default 10)
--cull-admin-users Whether admin users should be culled (only
if --cull-users=true). (default True)
--cull-every The interval (in seconds) for checking for
idle servers to cull. (default 0)
--cull-default-servers Whether default servers should be culled (only
if --cull-default-servers=true). (default True)
--cull-named-servers Whether named servers should be culled (only
if --cull-named-servers=true). (default True)
--cull-users Cull users in addition to servers. This is
for use in temporary-user cases such as
tmpnb. (default False)
--internal-certs-location The location of generated internal-ssl
certificates (only needed with --ssl-
enabled=true). (default internal-ssl)
--max-age The maximum age (in seconds) of servers that
should be culled even if they are active.
(default 0)
--remove-named-servers Remove named servers in addition to stopping
them. This is useful for a BinderHub that
uses authentication and named servers.
(default False)
--ssl-enabled Whether the Jupyter API endpoint has TLS
enabled. (default False)
--timeout The idle timeout (in seconds). (default 600)
--url The JupyterHub API URL.
JupyterHub's last_activity
data about user servers is not updated with high
frequency, so cull timeout should be greater than the sum of:
JupyterHub.last_activity_interval
(default: 5 minutes)If you want to use --cull-users
with a different culling interval for the
user servers and users, you must start two idle culler services. This is
because both are configured via --timeout
and --max-age
. To do so,
configure this service to start twice with different configuration, where one
has the --cull-users
option.
By default jupyterhub-idle-cullers
HTTP requests to JupyterHub's REST API
timeouts after 60 seconds. This can be changed by setting the
JUPYTERHUB_REQUEST_TIMEOUT
environment variable.
JupyterHub's REST API is used to acquire information about activity, and if the idle culler service based on configuration thinks a server should be stopped or deleted it also does so via JupyterHub's REST API.
jupyterhub-idle-culler
relies on permission to work against JupyterHub's REST
API is provided via the JUPYTERHUB_API_TOKEN
, that is set automatically for
managed services started by JupyterHub.
jupyterhub-idle-culler
lists available users and their server's reported
last_activity
via JupyterHub's /users
REST API and makes decisions based on
that. User's default servers can be stopped via /users/{name}/server
, named
servers can be stopped and optionally removed via
/users/{name}/servers/{server_name}
, and users can optionally be deleted via
/users/{name}
.
JupyterHub's reported last_activity
for user servers is updated by JupyterHub
at a regular interval in the update_last_activity
function that relies on
two sources of information.
The proxy's routes data
The configurable proxy class for JupyterHub is an interface for JupyterHub to
request routing of network traffic to user servers. Through this interface,
JupyterHub be informed on network activity if the proxy class provides it,
specifically via the get_all_routes
function.
The configurable-http-proxy used in https://z2jh.jupyter.org provides information about network routes activity, but traefik-proxy used in https://tljh.jupyter.org currently does not.
The user server's activity reports
The update_last_activity
function also reads JupyterHub's database that
keeps state about servers last_activity
. These database records are updated
whenever a server notifies JupyterHub about activity, as they are required to
do.
Servers has before JupyterHub 4 notified JupyterHub about activity by being
started by the jupyterhub-singleuser
script made available by installing
jupyterhub
(or jupyterhub-singleuser
on conda-forge). With JupyterHub 4+
and jupyter_server 2+ a jupyter_server server extension can be used instead.
The jupyterhub-singleuser
script launches a modified server application
that keeps JupyterHub updated with the server activity via the
notify_activity
function.
The notify_activity
function in turn make use of the server applications
last_activity
function (see implementation in NotebookApp and ServerApp
respectively) that that combines information from API activity, kernel
activity, kernel shutdown, and terminal activity. This activity also covers
activity of applications like RStudio running via jupyter-server-proxy
.
Here is a summary of what's described so far:
jupyterhub-idle-culler
collects information and acts entirely through
JupyterHub's REST API.jupyterhub-idle-culler
makes decisions based on information provided by
JupyterHub, that collects activity reports from the user servers and polls
the proxy class for information about user servers' network activity.Now, as the server's kernel activity influence the activity that servers will
notify JupyterHub about, the kernel activity in turn influences
jupyterhub-idle-culler
. Due to this, it can be relevant to also learn a little
about a mechanism to cull idle kernels as well even though
jupyterhub-idle-culler
isn't involved in that.
The default kernel manager, the MappingKernelManager
, can be configured to
cull idle kernels. Its configuration is documented in
ServerApp's
and
NotebookApp's
respective documentation, and here are some relevant kernel culling
configuration options:
MappingKernelManager.cull_busy
MappingKernelManager.cull_idle_timeout
MappingKernelManager.cull_interval
MappingKernelManager.cull_connected
Note that cull_connected
can be tricky to understand for JupyterLab as a
browser having a web-socket connection to a kernel or not isn't as obvious as
it was in the classical Jupyter notebook UI. See this issue for more
details.
Also note that configuration of MappingKernelManager should be made on the
user server itself, for example via a jupyter_server_config.py
file in
/etc/jupyter
or /usr/local/etc/jupyter
rather than where JupyterHub is
running.
Finally, note that a Jupyter server can shut itself down without intervention by
jupyterhub-idle-culler
if ServerApp.shutdown_no_activity_timeout
is
configured.
JupyterHub 2.0 introduces pagination to the /users
API endpoint. This
pagination does not guarantee a consistent snapshot for consecutive requests
spread over time, so it is possible for a highly active hub to occasionally miss
culling users crossing page boundaries between requests. This is expected to be
an infrequent occurrence and only result in delaying a server being culled by
one cull interval in realistic scenarios, so of minor consequence in JupyterHub.
The issue can be mitigated by requesting a larger page size, via e.g.
--api-page-size=200
, but feel free to open an issue if this is causing a
problem for you.
FAQs
JupyterHub Python repository template
We found that jupyterhub-idle-culler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.