
Security News
Bun 1.2.19 Adds Isolated Installs for Better Monorepo Support
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.
The file format behind TACO. 🫓
GitHub: https://github.com/tacofoundation/tortilla-python 🌐
PyPI: https://pypi.org/project/pytortilla/ 🛠️
Hello! I'm a Tortilla, a format to serialize your EO data 🤗.
pytortilla is a Python package that simplifies the creation and management of .tortilla
files—these files are designed to encapsulate metadata, dataset information, and links to relevant files in remote sensing or AI workflows.
This package is “re-exported” within tacotoolbox
, specifically under tacotoolbox.tortilla
. Therefore, by installing and using pytortilla
, you can also leverage it from tacotoolbox.tortilla
.
Sample
, Samples
) to describe and structure your data’s information..tortilla
files.tacotoolbox
: Load and manipulate these datasets with tacoreader
and other helper functions from tacotoolbox
.pip install pytortilla
or from source:
git clone https://github.com/tacofoundation/tortilla-python.git
cd tortilla-python
pip install .
Note: You may also install it as part of tacotoolbox
, where pytortilla
is included as a dependency.
In this guide, we delve deeper into the step-by-step creation of .tortilla
files, providing tips and best practices.
import pathlib
import rasterio
import pandas as pd
from sklearn.model_selection import train_test_split
import pytortilla
If you need Earth Engine:
import ee
ee.Initialize() # Requires prior authentication if not done already
Move the Files from Hugging Face to Your Local Machine
import os
# URL path to the Hugging Face repository
path = "https://huggingface.co/datasets/tacofoundation/tortilla_demo/resolve/main/"
# List of demo files to download
files = [
"demo/high__test__ROI_0010__20190125T112341_20190125T112624_T28QFG.tif",
"demo/high__test__ROI_0011__20190130T103251_20190130T104108_T31REP.tif",
"demo/high__test__ROI_0011__20190830T102029_20190830T102552_T31REP.tif",
"demo/high__test__ROI_0064__20190317T015619_20190317T020354_T51JVH.tif",
"demo/high__test__ROI_0120__20191219T045209_20191219T045214_T45TXE.tif",
"demo/high__test__ROI_0141__20190316T141049_20190316T142437_T19FDE.tif",
"demo/high__test__ROI_0159__20200403T143721_20200403T144642_T19HBV.tif",
"demo/high__test__ROI_0235__20200402T053639_20200402T053638_T44UNV.tif"
]
# Create a local folder called 'demo' (if not already existing)
os.system("mkdir -p demo")
# Download each file to the 'demo' folder
for file in files:
os.system(f"wget {path}{file} -O {file}")
Note: Depending on your environment, you might prefer using requests or urllib instead of os.system for downloading files.
At this point, you should have a demo/
folder populated with several .tif
files.
Now, we will create samples using pytortilla
:
import pathlib
import pandas as pd
from sklearn.model_selection import train_test_split
import rasterio
from pytortilla.datamodel import Sample, Samples
# Define the local path containing the TIFF files
demo_path = pathlib.Path("./demo")
# Collect all .tif files in the demo folder
all_files = list(demo_path.glob("*.tif"))
# Split into train, val, and test
train_files, test_files = train_test_split(all_files, test_size=0.2, random_state=42)
train_files, val_files = train_test_split(train_files, test_size=0.2, random_state=42)
train_df = pd.DataFrame({"path": train_files, "split": "train"})
val_df = pd.DataFrame({"path": val_files, "split": "validation"})
test_df = pd.DataFrame({"path": test_files, "split": "test"})
dataset_full = pd.concat([train_df, val_df, test_df], ignore_index=True)
# Build a list of Sample objects
samples_list = []
for _, row in dataset_full.iterrows():
with rasterio.open(row.path) as src:
metadata = src.profile
sample_obj = Sample(
id=row.path.stem,
path=str(row.path),
file_format="GTiff",
data_split=row.split,
stac_data={
"crs": str(metadata["crs"]),
"geotransform": metadata["transform"].to_gdal(),
"raster_shape": (metadata["height"], metadata["width"])
}
)
samples_list.append(sample_obj)
samples_obj = Samples(samples=samples_list)
Validate each .tif
file by trying to open it:
samples_obj.deep_validator(read_function=lambda x: rasterio.open(x))
If you need RAI metadata (or any other additional metadata) in your workflow, you can include it:
samples_obj = samples_obj.include_rai_metadata(
sample_footprint=5120, # Example footprint value
cache=False,
quiet=False
)
.tortilla
fileUse pytortilla.create.main.create()
(or the equivalent tacotoolbox.tortilla.create
if you have tacotoolbox
installed):
from pytortilla.create.main import create
# Generate the .tortilla file
output_file = create(
samples=samples_obj,
output="demo_dataset.tortilla"
)
print(f"Tortilla file generated: {output_file}")
The .tortilla might split into multiple files (.0000.part.tortilla
, etc.) for large datasets.
.tortilla
fileFinally, load the .tortilla
file (or its parts) with tacoreader:
import tacoreader
import pandas as pd
dataset_chunks = []
# Try loading .part.tortilla files (assuming a maximum of 4 parts for this example)
for i in range(4):
part_file = f"demo_dataset.{i:04d}.part.tortilla"
try:
dataset_part = tacoreader.load(part_file)
dataset_chunks.append(dataset_part)
except FileNotFoundError:
break # Stop if no more parts
if dataset_chunks:
dataset = pd.concat(dataset_chunks, ignore_index=True)
print(dataset.head())
else:
print("No tortilla parts found.")
FAQs
The file format behind TACO.
We found that pytortilla demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.
Security News
Popular npm packages like eslint-config-prettier were compromised after a phishing attack stole a maintainer’s token, spreading malicious updates.
Security News
/Research
A phishing attack targeted developers using a typosquatted npm domain (npnjs.com) to steal credentials via fake login pages - watch out for similar scams.