Stacking Sats Pipeline
A data engineering pipeline for extracting, loading, and merging cryptocurrency and financial data from multiple sources.
Requirements
Installation
pip install stacking-sats-pipeline
Quick Start
Extract all data sources to local files for offline analysis:
CLI Usage
stacking-sats --extract-data csv
stacking-sats --extract-data parquet
stacking-sats --extract-data csv --output-dir data/
stacking-sats --extract-data parquet -o exports/
Python API
from stacking_sats_pipeline import extract_all_data
extract_all_data("csv")
extract_all_data("parquet", "data/exports/")
Data Loading
from stacking_sats_pipeline import load_data
df = load_data()
from stacking_sats_pipeline.data import CoinMetricsLoader
loader = CoinMetricsLoader()
btc_data = loader.load_from_web()
What gets extracted:
- 📈 Bitcoin Price Data (CoinMetrics) →
btc_coinmetrics.csv/parquet
- 😨 Fear & Greed Index (Alternative.me) →
fear_greed.csv/parquet
- 💵 U.S. Dollar Index (FRED) →
dxy_fred.csv/parquet*
*Requires FRED_API_KEY environment variable. Get a free key at FRED API
File Format Benefits:
- CSV: Human-readable, universally compatible
- Parquet: ~50% smaller files, faster loading, preserves data types
Multi-Source Data Loading
from stacking_sats_pipeline.data import MultiSourceDataLoader
loader = MultiSourceDataLoader()
available_sources = loader.get_available_sources()
merged_df = loader.load_and_merge(available_sources)
print(f"Available data sources: {available_sources}")
print(f"Merged data shape: {merged_df.shape}")
Data Sources
CoinMetrics (Bitcoin Price Data)
from stacking_sats_pipeline.data import CoinMetricsLoader
loader = CoinMetricsLoader(data_dir="data/")
df = loader.load_from_web()
df = loader.load_from_file()
csv_path = loader.extract_to_csv()
parquet_path = loader.extract_to_parquet()
Fear & Greed Index
from stacking_sats_pipeline.data import FearGreedLoader
loader = FearGreedLoader(data_dir="data/")
df = loader.load_from_web()
FRED (Federal Reserve Economic Data)
import os
os.environ['FRED_API_KEY'] = 'your_api_key_here'
from stacking_sats_pipeline.data import FREDLoader
loader = FREDLoader(data_dir="data/")
df = loader.load_from_web()
Development
For development and testing:
Requirements: Python 3.11 or 3.12
git clone https://github.com/hypertrial/stacking_sats_pipeline.git
cd stacking_sats_pipeline
make setup-dev
pip install -e ".[dev]"
pre-commit install
make test
make lint
make format
make check
pytest -m "not integration"
pytest -m integration
Code Quality Standards
⚠️ MANDATORY: All code must pass ruff linting and formatting checks.
- Linting/Formatting: We use ruff for both linting and code formatting
- Pre-commit hooks: Automatically run on every commit to catch issues early
- CI enforcement: Pull requests will fail if code doesn't meet standards
Quick commands:
make help
make lint
make autopep8
make format
make format-all
make check
For detailed testing documentation, see TESTS.md.
Contributing Data Sources
The data loading system is designed to be modular and extensible. To add new data sources (exchanges, APIs, etc.), see the Data Loader Contribution Guide which provides step-by-step instructions for implementing new data loaders.
Command Line Options
stacking-sats --extract-data csv --output-dir data/
stacking-sats --extract-data parquet -o exports/
stacking-sats --help
Project Structure
├── stacking_sats_pipeline/
│ ├── main.py # Pipeline orchestrator and CLI
│ ├── config.py # Configuration constants
│ ├── data/ # Modular data loading system
│ │ ├── coinmetrics_loader.py # CoinMetrics data source
│ │ ├── fear_greed_loader.py # Fear & Greed Index data source
│ │ ├── fred_loader.py # FRED economic data source
│ │ ├── data_loader.py # Multi-source data loader
│ │ └── CONTRIBUTE.md # Guide for adding data sources
│ └── __init__.py # Package exports
├── tutorials/examples.py # Interactive examples
└── tests/ # Comprehensive test suite
API Reference
Core Functions
from stacking_sats_pipeline import (
extract_all_data,
load_data,
validate_price_data,
extract_btc_data_to_csv,
extract_btc_data_to_parquet
)
Configuration Constants
from stacking_sats_pipeline import (
BACKTEST_START,
BACKTEST_END,
CYCLE_YEARS,
MIN_WEIGHT,
PURCHASE_FREQ
)
Data Validation
All data sources include built-in validation:
from stacking_sats_pipeline import validate_price_data
df = load_data()
is_valid = validate_price_data(df)
requirements = {
'required_columns': ['PriceUSD', 'Volume'],
'min_price': 100,
'max_price': 1000000
}
is_valid = validate_price_data(df, **requirements)
File Format Support
The pipeline supports both CSV and Parquet formats:
- CSV: Universal compatibility, human-readable
- Parquet: Better compression (~50% smaller), faster loading, preserves data types
extract_all_data("csv", "output_dir/")
extract_all_data("parquet", "output_dir/")
Timestamp Handling
All data sources normalize timestamps to midnight UTC for consistent merging:
loader = MultiSourceDataLoader()
merged_df = loader.load_and_merge(['coinmetrics', 'fred'])
print(merged_df.index.tz)
print(merged_df.index.time[0])
Error Handling
The pipeline includes comprehensive error handling:
try:
df = extract_all_data("csv")
except Exception as e:
print(f"Data extraction failed: {e}")
Individual data sources fail gracefully - if one source is unavailable, others will still be extracted.