
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Extract and aggregate spatio-temporal gridded data in netCDF or GRIB format to a specified area.
stgrid2area
is a Python package that simplifies the workflow of extracting and aggregating spatio-temporal gridded data (e.g., climate data, weather forecasts) to geometrically defined areas like catchments, administrative boundaries, or any other geographical region. The package handles the process of clipping gridded data to the specified area's boundaries and then calculating spatial statistics across the area.
Figure: Clip and aggregate spatio-temporal gridded data to areas. Image from https://doi.org/10.5194/essd-16-5625-2024
xarray.Dataset
or save to .nc
pd.DataFrame
or save to .csv
pip install stgrid2area
The workflow for clipping and aggregating spatio-temporal gridded data to specified areas usually involves the following steps:
xarray
.geopandas
.stgrid.rio.write_crs("EPSG:4326")
) and reproject the geometries to match the CRS of the stgrid data using geometries.to_crs(stgrid.crs)
or reformat the stgrid data if necessary.Area
object for each geometry, which encapsulates the clipping and aggregation logic. You can use the helper function geodataframe_to_areas
to create a list of Area
objects from a GeoDataFrame.area.clip()
and area.aggregate()
directly or use a processor class for parallel processing.import xarray as xr
import geopandas as gpd
from stgrid2area import Area
# Read the spatio-temporal gridded data
stgrid = xr.open_dataset("path/to/climate_data.nc")
stgrid = stgrid.rio.set_crs("EPSG:4326") # Set the CRS if not already set
# Read the geometry data (e.g., a catchment boundary)
geometry = gpd.read_file("path/to/catchments.gpkg")
geometry = geometry.to_crs(stgrid.crs) # Ensure CRS matches the gridded data
# Create an Area object
area = Area(geometry=geometry, id="catchment1", output_dir="/path/to/output")
# Clip the gridded data to the area
clipped_data = area.clip(stgrid)
# Aggregate a single variable across the area
aggregated = area.aggregate(
clipped_data,
variables="temperature",
method="exact_extract",
operations=["mean", "min", "max"]
)
# Save the aggregated data
aggregated.to_csv("/path/to/output/temperature_stats.csv")
# Aggregate multiple variables at once
aggregated = area.aggregate(
clipped_data,
variables=["temperature", "precipitation", "wind_speed"],
method="exact_extract",
operations=["mean", "max"]
)
# The result will contain columns like: temperature_mean, temperature_max, precipitation_mean, precipitation_max, wind_speed_mean, wind_speed_max
import geopandas as gpd
from stgrid2area import Area, LocalDaskProcessor, geodataframe_to_areas
# Read the spatio-temporal gridded data
stgrid = xr.open_dataset('path/to/climate_data.nc')
# Read multiple geometries (e.g., multiple catchments)
geometries = gpd.read_file('path/to/catchments.gpkg')
# Create a list of Area objects
areas = geodataframe_to_areas(
geometries,
output_dir="/path/to/output",
id_column="catchment_id" # Column containing unique identifiers for each area
)
# Create a single-node processor with 4 workers
processor = LocalDaskProcessor(
areas=areas,
stgrid=stgrid,
variables=["temperature", "precipitation"],
method="exact_extract",
operations=["mean", "max"],
n_workers=4,
save_nc=True, # Save clipped NetCDF files
save_csv=True # Save aggregated CSV files
)
# Run the processing
processor.run()
Area
ClassThe Area
class represents a geographical area (e.g., a catchment or administrative boundary) and provides methods to clip and aggregate gridded data to this area.
from stgrid2area import Area
# Create an Area object
area = Area(
geometry=geometry_gdf, # geopandas GeoDataFrame containing the geometry
id="unique_identifier", # Unique identifier for this area
output_dir="/path/to/output" # Directory to save output files
)
# Clip the gridded data to the area's geometry
clipped_data = area.clip(
stgrid, # xarray.Dataset: The gridded data to clip
all_touched=True, # Whether to include cells that are partially within the area
save_result=True, # Whether to save the clipped data to disk
skip_exist=False, # Whether to skip clipping if the output file already exists
filename=None # Optional filename for the output file
)
# Aggregate the clipped data spatially
aggregated_data = area.aggregate(
stgrid, # xarray.Dataset: The gridded data to aggregate
variables=["temperature", "precipitation"], # Variable(s) to aggregate
method="exact_extract", # Aggregation method: "exact_extract" or "xarray"
operations=["mean", "min", "max", "stdev"], # Statistical operations
save_result=True, # Whether to save the aggregated data to disk
skip_exist=False, # Whether to skip aggregation if the output file already exists
filename=None # Optional filename for the output file
)
stgrid2area supports two aggregation methods:
exact_extract
: Uses the exactextract library to calculate weighted statistics based on cell coverage. This provides more accurate results, especially for areas with irregular shapes and with cells that are partially within the area.
xarray
: Uses xarray's built-in statistical functions to calculate unweighted statistics.
For processing multiple areas efficiently, stgrid2area provides several processor classes:
For processing on a single machine / a single-node HPC batch queue with multiple cores. This processor class creates a Dask cluster locally and distributes the workload across multiple workers.
from stgrid2area import LocalDaskProcessor
processor = LocalDaskProcessor(
areas=areas, # List of Area objects
stgrid=stgrid, # The gridded data
variables="temperature", # Variable(s) to aggregate
method="exact_extract", # Aggregation method
operations=["mean", "max"], # Statistical operations
n_workers=4, # Number of parallel workers
skip_exist=True, # Skip areas that already have output files
batch_size=10, # Process areas in batches of this size
save_nc=True, # Save clipped NetCDF files
save_csv=True # Save aggregated CSV files
)
processor.run()
For processing with MPI (Message Passing Interface) on HPC systems. This processor is designed to run on distributed systems with multiple nodes, allowing for efficient parallel processing of large datasets.
To use this method you need to initialize a Dask MPI client yourself and pass it to processor.run()
. On the HPC system, you would typically launch the Python processing script using mpirun
inside a SLURM job script or similar.
from stgrid2area import MPIDaskProcessor
from dask.distributed import Client
import dask_mpi
# Initialize dask_mpi
dask_mpi.initialize()
client = Client()
processor = MPIDaskProcessor(
areas=areas, # List of Area objects
stgrid=stgrid, # The gridded data
variables=["temperature", "precipitation"], # Variable(s) to aggregate
method="exact_extract", # Aggregation method
operations=["mean", "max"], # Statistical operations
skip_exist=True, # Skip areas that already have output files
batch_size=100 # Process areas in batches of this size
)
processor.run(client=client)
If your stgrid data is too large to fit in memory, you can process it in chunks. The LocalDaskProcessor
and MPIDaskProcessor
classes can handle a list of xarray datasets, the clipped and aggregated results are saved for each input dataset separately, so you have to care about post-processing the results if you want to combine them later.
import xarray as xr
from stgrid2area import LocalDaskProcessor
# Read multiple files, each containing data for a different time period
stgrid_files = [
'path/to/data_2020.nc',
'path/to/data_2021.nc',
'path/to/data_2022.nc'
]
# Create a list of xarray datasets
stgrids = [xr.open_dataset(file) for file in stgrid_files]
# Create a processor with multiple stgrids
processor = LocalDaskProcessor(
areas=areas,
stgrid=stgrids, # List of xarray datasets
variables="temperature",
method="exact_extract",
operations=["mean", "max"],
n_workers=4
)
# Run the processing
processor.run()
The stgrid2area-workflows repository is a collection of implemented data processing workflows using the stgrid2area
package. It is mainly used for processing meteorological input data for the CAMELS-DE dataset. It also includes Python scripts and SLURM job scripts for running the workflows on HPC systems.
Hence, it can serve as a reference for how to use stgrid2area
in practice, even though the code is not always perfectly documented and under development.
FAQs
Extract and aggregate spatio-temporal data to a specified area.
We found that stgrid2area demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.