Security News
Highlights from the 2024 Rails Community Survey
A record 2,709 developers participated in the 2024 Ruby on Rails Community Survey, revealing key tools, practices, and trends shaping the Rails ecosystem.
A library that takes care of several tedious aspects of working with big data on an HPC cluster.
Idact, or Interactive Data Analysis Convenience Tools, is a Python 3.5+ library that takes care of several tedious aspects of working with big data on an HPC cluster.
Data scientists or big data enthusiasts, who:
Python 3.5+.
python -m pip install idact
If you're using Conda, you may want to update your environment first:
conda update --all
Cluster can be accessed with a public/private key pair via SSH.
from idact import *
cluster = add_cluster(name="short-cluster-name",
user="user",
host="login-node.cluster.example.com",
port=22,
auth=AuthMethod.PUBLIC_KEY,
key="~/.ssh/id_rsa",
install_key=False)
node = cluster.get_access_node()
node.connect()
Tutorial: 01. Connecting to a cluster
Nodes are allocated as a Slurm job. Afterwards, they can be used for deployments.
import bitmath
nodes = cluster.allocate_nodes(nodes=8,
cores=12,
memory_per_node=bitmath.GiB(120),
walltime=Walltime(hours=1, minutes=30),
native_args={
'--partition': 'debug',
'--account': 'data-analysis-group'
})
try:
nodes.wait(timeout=120.0)
except TimeoutError:
nodes.cancel()
Tutorial: 02. Allocating nodes
Jupyter Notebook is deployed on a cluster node, and made accessible through an SSH tunnel.
nb = nodes[0].deploy_notebook()
nb.open_in_browser()
Tutorial: 03. Deploying Jupyter
Dask.distributed scheduler and workers are deployed on cluster nodes, and their dashboards are made available through SSH tunnels.
dd = deploy_dask(nodes[1:])
client = dd.get_client()
client.submit(...)
dd.diagnostics.open_all()
Tutorial: 04. Deploying Dask, 09. Demo analysis
Local and remote cluster configuration can be saved, loaded, and copied to and from the cluster.
save_environment()
load_environment()
push_environment(cluster)
pull_environment(cluster)
Tutorials: 01. Connecting to a cluster, 05. Configuring idact on a cluster
Deployment objects can be serialized and copied between running program instances, local or remote.
cluster.push_deployment(nodes)
cluster.push_deployment(nb)
cluster.push_deployment(dd)
cluster.pull_deployments()
Tutorials: 06. Working on a cluster, 07. Adjusting timeouts
Quick deployment app allocates nodes and deploys Jupyter notebook from command line:
idact-notebook short-cluster-name --nodes 3 --walltime 0:20:00
Tutorial: 08. Using the quick deployment app
The documentation contains detailed API description, tutorial notebooks, and other helpful information.
The source code is available on GitHub.
MIT License.
This library was developed under the supervision of Leszek Grzanka, PhD as a final project of the BEng in Computer Science program at the Faculty of Computer Science, Electronics and Telecommunications at AGH University of Science and Technology, Krakow.
FAQs
A library that takes care of several tedious aspects of working with big data on an HPC cluster.
We found that idact demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
A record 2,709 developers participated in the 2024 Ruby on Rails Community Survey, revealing key tools, practices, and trends shaping the Rails ecosystem.
Security News
In 2023, data breaches surged 78% from zero-day and supply chain attacks, but developers are still buried under alerts that are unable to prevent these threats.
Security News
Solo open source maintainers face burnout and security challenges, with 60% unpaid and 60% considering quitting.