
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
pybcsv
Advanced tools
High-performance Python bindings for the BCSV (Binary CSV) library with pandas integration
High-performance Python bindings for the BCSV (Binary CSV) library — fast, compact time-series storage with pandas integration.
with statementsReaderDirectAccessfrom_csv() / to_csv()pip install pybcsv
# With pandas support
pip install pybcsv[pandas]
import pybcsv
# Define schema
layout = pybcsv.Layout()
layout.add_column("id", pybcsv.INT32)
layout.add_column("name", pybcsv.STRING)
layout.add_column("value", pybcsv.DOUBLE)
# Write rows (context manager auto-closes)
with pybcsv.Writer(layout) as writer:
writer.open("data.bcsv")
writer.write_row([1, "Alice", 123.45])
writer.write_row([2, "Bob", 678.90])
# Read all rows
with pybcsv.Reader() as reader:
reader.open("data.bcsv")
for row in reader: # iterator protocol
print(row)
# or: all_rows = reader.read_all()
import pybcsv
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'value': [123.45, 678.90, 111.22]
})
# Write DataFrame (columnar path, numpy zero-copy for numerics)
pybcsv.write_dataframe(df, "data.bcsv")
# Read back as DataFrame
df_read = pybcsv.read_dataframe("data.bcsv")
import pybcsv
pybcsv.from_csv("input.csv", "output.bcsv") # CSV → BCSV
pybcsv.to_csv("output.bcsv", "output.csv") # BCSV → CSV
Zero-copy Polars DataFrame I/O via the Arrow C Data Interface:
import pybcsv
# Read BCSV → Polars DataFrame (zero-copy via Arrow)
df = pybcsv.read_polars("data.bcsv")
# Write Polars DataFrame → BCSV
pybcsv.write_polars(df, "output.bcsv", row_codec="delta")
Install with the optional Polars dependency:
pip install pybcsv[polars]
import pybcsv
with pybcsv.ReaderDirectAccess() as da:
da.open("data.bcsv")
print(f"Total rows: {len(da)}")
row = da[42] # read row 42 directly (O(1) seek)
print(da.read(100)) # alternative syntax
| Constant | Description |
|---|---|
pybcsv.BOOL | Boolean |
pybcsv.INT8 / pybcsv.UINT8 | 8-bit integers |
pybcsv.INT16 / pybcsv.UINT16 | 16-bit integers |
pybcsv.INT32 / pybcsv.UINT32 | 32-bit integers |
pybcsv.INT64 / pybcsv.UINT64 | 64-bit integers |
pybcsv.FLOAT | 32-bit float |
pybcsv.DOUBLE | 64-bit float |
pybcsv.STRING | Variable-length string |
layout = pybcsv.Layout() # empty layout
layout = pybcsv.Layout([ColumnDefinition("x", INT32)]) # from list
layout.add_column(name: str, type: ColumnType)
layout.add_column(col: ColumnDefinition)
layout.column_count() -> int
layout.column_name(index: int) -> str
layout.column_type(index: int) -> ColumnType
layout.has_column(name: str) -> bool
layout.column_index(name: str) -> int
layout.get_column_names() -> list[str]
layout.get_column_types() -> list[ColumnType]
layout.get_column(index: int) -> ColumnDefinition
len(layout) # column count
layout[i] # ColumnDefinition at index i
writer = pybcsv.Writer(layout: Layout, row_codec: str = "delta")
writer.open(filename: str, overwrite: bool = False,
compression_level: int = 1, block_size_kb: int = 64,
flags: FileFlags = FileFlags.BATCH_COMPRESS) # raises RuntimeError on failure
writer.write_row(values: list)
writer.write_rows(rows: list[list]) # batch write
writer.flush()
writer.close()
writer.is_open() -> bool
writer.row_count() -> int
writer.row_codec() -> str
writer.compression_level() -> int
writer.layout() -> Layout
# Context manager
with pybcsv.Writer(layout) as w:
w.open("out.bcsv")
w.write_row([...])
Row codec options: "flat", "zoh" (zero-order hold), "delta" (default).
reader = pybcsv.Reader()
reader.open(filename: str) # raises RuntimeError on failure
reader.read_next() -> bool # advance to next row
reader.read_row() -> list | None # read+advance, None at EOF
reader.read_all() -> list[list] # read remaining rows
reader.close()
reader.is_open() -> bool
reader.layout() -> Layout
reader.row_pos() -> int # current row index
reader.row_value(column: int) -> Any # typed value from current row
reader.row_dict() -> dict # current row as {name: value}
reader.file_flags() -> FileFlags
reader.compression_level() -> int
reader.version_string() -> str
reader.creation_time() -> str
reader.count_rows() -> int # total row count
# Iterator protocol
for row in reader:
print(row)
# Context manager
with pybcsv.Reader() as r:
r.open("data.bcsv")
for row in r:
print(row)
Random-access reader — reads any row by index without scanning.
da = pybcsv.ReaderDirectAccess()
da.open(filename: str, rebuild_footer: bool = False)
da.read(index: int) -> list # read row at index
da.row_count() -> int
da.layout() -> Layout
da.close()
da.is_open() -> bool
da.file_flags() -> FileFlags
da.compression_level() -> int
da.version_string() -> str
da.creation_time() -> str
len(da) # row count
da[i] # read row at index i
Native CSV I/O with the same Layout-based schema.
# Write CSV
csv_w = pybcsv.CsvWriter(layout, delimiter=',', decimal_sep='.')
csv_w.open(filename, overwrite=False, include_header=True)
csv_w.write_row(values)
csv_w.write_rows(rows)
csv_w.close()
# Read CSV
csv_r = pybcsv.CsvReader(layout, delimiter=',', decimal_sep='.')
csv_r.open(filename, has_header=True)
for row in csv_r: # iterator support
print(row)
csv_r.close()
Bytecode VM for filtering and projecting rows from an open Reader.
reader = pybcsv.Reader()
reader.open("data.bcsv")
sampler = pybcsv.Sampler(reader)
sampler.set_conditional("col_a > 10") # filter expression
sampler.set_selection("col_a, col_b") # column projection
result = sampler.output_layout() # SamplerCompileResult (bool-testable)
if result:
for row in sampler: # iterate matching rows
print(row)
pybcsv.FileFlags.NONE
pybcsv.FileFlags.ZERO_ORDER_HOLD
pybcsv.FileFlags.NO_FILE_INDEX
pybcsv.FileFlags.STREAM_MODE
pybcsv.FileFlags.BATCH_COMPRESS
pybcsv.FileFlags.DELTA_ENCODING
# Combinable with | and &
flags = pybcsv.FileFlags.BATCH_COMPRESS | pybcsv.FileFlags.NO_FILE_INDEX
# Pandas integration (requires pandas)
pybcsv.write_dataframe(df, filename,
compression_level=1,
row_codec="delta",
type_hints=None) # dict[str, ColumnType]
pybcsv.read_dataframe(filename, columns=None) # -> pd.DataFrame
# CSV conversion (requires pandas)
pybcsv.from_csv(csv_file, bcsv_file, compression_level=1, type_hints=None)
pybcsv.to_csv(bcsv_file, csv_file)
# Columnar I/O (numpy arrays)
pybcsv.read_columns(filename) -> dict[str, np.ndarray | list[str]]
pybcsv.write_columns(filename, columns, col_order, col_types,
row_codec="delta", compression_level=1)
# Type utilities
pybcsv.type_to_string(column_type) -> str
pip install pybcsv[test]
python -m pytest tests/ -v
python/
├── pybcsv/
│ ├── __init__.py # Public API and exports
│ ├── __version__.py # Version (setuptools-scm)
│ ├── bindings.cpp # C++ nanobind bindings
│ └── pandas_utils.py # Pandas/CSV integration
├── examples/
│ ├── basic_usage.py # Core BCSV operations
│ ├── pandas_integration.py # DataFrame examples
│ ├── advanced_usage.py # DirectAccess, Sampler, CSV, columnar I/O
│ └── performance_benchmark.py
├── tests/ # 17 test modules (pytest)
├── benchmarks/ # Python benchmark runner
├── pyproject.toml
└── README.md
Arrow string columns: 2 GB per batch. The Arrow C Data Interface uses
utf8 format ("u") with int32 offsets, limiting the total byte size of any
single string column within one batch to ~2 GB. An OverflowError is raised
at runtime if this limit is exceeded. For most workloads this is not an issue.
If you hit this limit, consider splitting data into smaller batches.
No native null/missing value support. BCSV is a fixed-width binary format
without a null bitmap. When writing a pandas DataFrame with NaN/None values,
they are coerced to zero, False, or empty string by default (with a warning).
Use strict=True in write_dataframe() to reject NaN values instead.
pip install pybcsv[pandas])MIT — see LICENSE for details.
Wheels are built automatically via GitHub Actions (cibuildwheel) and published using Trusted Publisher (OIDC) — no API tokens required.
main/master or version tagsv* tags (e.g. git tag v1.4.0 && git push origin v1.4.0)release branch or via manual workflow_dispatch.# in a fresh virtualenv
python -m venv venv && source venv/bin/activate
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pybcsv
python -c "import pybcsv; print(pybcsv.__version__)"
If the import and version check succeed the wheel is good for release.
FAQs
High-performance Python bindings for the BCSV (Binary CSV) library with pandas integration
We found that pybcsv demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.