🚀 Syft Code Queue
A simple, lightweight system for executing code on remote SyftBox datasites with manual approval workflows.
Overview
Syft Code Queue provides a clean separation between data scientists who submit code for execution and data owners who review and approve that code. All code execution requires explicit manual approval - there is no automatic approval built into the core system.
Architecture
Data Scientist → Submit Code → Data Owner Reviews → Manual Approve → Execute → Results
Key Features
- 📦 Simple Code Submission: Package code as folders with
run.sh
scripts
- 🔒 Manual Approval Only: Data owners must explicitly approve every job
- 🛡️ Security: Safe execution with sandboxing and resource limits
- 🤖 External Automation: Automation systems call the manual approval API
- 📊 Job Management: Track job status and retrieve results
- ⚡ Lightweight: Much simpler than RDS while being fully functional
- 🎨 Interactive Jupyter UI: Beautiful HTML tables with clickable approve/reject buttons
- 🔍 Tab Completion: Full tab completion support for all properties and methods
Quick Start
Simple Unified API
import syft_code_queue as q
job = q.submit_job(
target_email="data-owner@university.edu",
code_folder="./my_analysis",
name="Statistical Analysis",
description="Aggregate statistics computation",
tags=["statistics", "privacy-safe"]
)
job = q.submit_script(
target_email="data-owner@university.edu",
script_content="print('Hello, world!')",
name="Hello World Test"
)
q.my_jobs()
q.pending_for_me()
q.approve("job-id", "Looks safe")
q.status()
print(f"Job submitted: {job.uid}")
print(f"Status: {job.status}")
Interactive Jupyter Interface
In Jupyter notebooks, jobs display as beautiful interactive cards:
job
q.jobs_for_me
q.jobs_for_others
q.pending_for_me
Interactive Features:
- 🔍 Real-time search - Filter jobs by name or email
- 📊 Status filtering - View pending, running, or completed jobs
- ✅ One-click actions - Approve/reject jobs directly from the interface
- 🎯 Batch operations - Approve or reject multiple jobs at once
- 📜 Inline code review - See job details and code files instantly
Managing Jobs (Python API)
import syft_code_queue as q
q.pending_for_me()
q.all_jobs_for_me()
q.review_job("job-id")
q.approve("job-id", "Looks safe")
q.reject("job-id", "Too broad")
q.status()
CLI Tools (Alternative)
scq pending
scq review a1b2c3d4
scq approve a1b2c3d4 -r "Looks safe"
scq reject a1b2c3d4 -r "Too broad"
scq list
scq status
Installation
pip install syft-code-queue
Tutorials
We provide role-specific tutorials for different users:
- 🔬 Data Scientists:
examples/Part 1 — Data Scientist Tutorial.ipynb
- Learn to submit and monitor jobs
- 🏛️ Data Owners:
examples/Part 2 — Data Owner Tutorial.ipynb
- Learn to review and approve jobs
Manual Approval Architecture
The core design principle is manual approval only:
✅ What's Included
- Job submission and queuing
- Manual approval/rejection API
- Safe code execution engine
- Job status tracking and results retrieval
❌ What's NOT Included
- Built-in auto-approval rules
- Automatic approval logic
- Built-in trust systems
🤖 External Automation
Any automation must be external and call the manual approval CLI or API:
import subprocess
def smart_approval_bot():
result = subprocess.run(['scq', 'pending'], capture_output=True, text=True)
for job_id in get_job_ids_from_output(result.stdout):
if meets_my_criteria(job_id):
subprocess.run(['scq', 'approve', job_id, '-r', 'Auto-approved by bot'])
else:
subprocess.run(['scq', 'reject', job_id, '-r', 'Does not meet criteria'])
See the examples directory for more automation examples.
Code Package Structure
Every job submission must be a folder containing:
my_analysis/
├── run.sh # Main execution script (required)
├── analyze.py # Your analysis code
├── requirements.txt # Python dependencies (optional)
└── README.md # Documentation (optional)
Example run.sh
:
#!/bin/bash
set -e
echo "Starting analysis..."
if [ -f requirements.txt ]; then
pip install -r requirements.txt
fi
python analyze.py
echo "Analysis complete!"
Security Features
- Safe Execution:
SafeCodeRunner
with timeouts and resource limits
- Command Filtering: Block dangerous operations
- Sandboxing: Isolated execution environment
- Manual Review: Human oversight of all code execution
- Audit Trail: All approvals/rejections are logged
Job Lifecycle
📤 submit → ⏳ pending → ✅ approved → 🏃 running → 🎉 completed
↘ 🚫 rejected ↘ ❌ failed
Status Reference
- pending: Waiting for data owner approval
- approved: Approved by data owner, waiting to execute
- running: Currently executing on datasite
- completed: Finished successfully, results available
- failed: Execution failed (see error logs)
- rejected: Rejected by data owner
Best Practices
When Submitting Jobs
- Use clear, descriptive job names and descriptions
- Include privacy-safe tags like
aggregate-analysis
, statistics
- Only request aggregate computations, never individual records
- Test code locally before submission
- Be responsive to questions about your submissions
When Managing Jobs Submitted to You
- Review all submitted code thoroughly
- Check for privacy compliance and data safety
- Provide clear feedback when rejecting requests
- Set up regular monitoring of your pending jobs
- Maintain clear approval criteria for your organization
- Use
q.review_job()
to examine job details before approving
API Reference
Unified Python API
import syft_code_queue as q
job = q.submit_job(target_email, code_folder, name, description, tags)
job = q.submit_script(target_email, script_content, name, description, requirements, tags)
q.my_jobs()
q.get_job(job_uid)
q.get_job_output(job_uid)
q.get_job_logs(job_uid)
q.wait_for_completion(job_uid)
q.pending_for_me()
q.all_jobs_for_me()
q.review_job(job_uid)
q.approve(job_uid, reason)
q.reject(job_uid, reason)
q.status()
q.help()
CLI API
scq pending
scq review <job_id>
scq approve <job_id> --reason "Approved because..."
scq reject <job_id> --reason "Rejected because..."
scq list
scq status
scq --help
SyftBox App Setup
To enable syft-code-queue on your datasite:
- Add to your SyftBox datasite - Copy the
syft-code-queue
folder to your datasite
- SyftBox auto-execution - SyftBox will periodically call
run.sh
to process jobs
- Use CLI tools - Data owners use
scq
commands to manage job approvals
Configuration
The app uses sensible defaults, but can be customized:
from syft_code_queue import QueueConfig
config = QueueConfig(
queue_name="code-queue",
max_concurrent_jobs=3,
job_timeout=600,
cleanup_completed_after=86400
)
SyftBox Integration
The run.sh
script handles the entire queue processing cycle:
- ✅ Checks for pending jobs (logs info for data owners)
- 🚀 Executes approved jobs
- 🧹 Cleans up old completed jobs
- 🚪 Exits (no long-running processes)
Integration with Other Tools
- syft-nsai: Generate analysis code with AI, execute with queue
- SyftBox: Leverages existing datasite infrastructure
- Custom Apps: Easy integration with any Python application
Development
git clone <repository>
cd syft-code-queue
pip install -e .
pytest
python examples/example_cross_datasite_workflow.py
Contributing
See CONTRIBUTING.md
for development guidelines.
License
Licensed under the Apache License 2.0. See LICENSE
file for details.
Simple. Secure. Manual. 🚀