New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

pybcsv

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pybcsv

High-performance Python bindings for the BCSV (Binary CSV) library with pandas integration

pipPyPI
Version
1.5.3
Maintainers
1

PyBCSV — Python Bindings for BCSV Library

High-performance Python bindings for the BCSV (Binary CSV) library — fast, compact time-series storage with pandas integration.

Features

  • High Performance: Binary format with optional LZ4 compression and delta encoding
  • Pandas Integration: Columnar DataFrame read/write via numpy zero-copy
  • Type Safety: Preserves column types and data integrity (10 numeric types + strings)
  • Cross-platform: Linux (x86_64, ARM64), macOS (x86_64, ARM64), Windows (AMD64)
  • Context Managers: All readers/writers support with statements
  • Streaming I/O: Row-by-row read/write, never loads entire file into memory
  • Direct Access: Random-access reads by row index via ReaderDirectAccess
  • Sampler: Bytecode VM for server-side row filtering and column projection
  • CSV Interop: Convert between CSV and BCSV via from_csv() / to_csv()

Installation

pip install pybcsv

# With pandas support
pip install pybcsv[pandas]

Quick Start

Write and Read

import pybcsv

# Define schema
layout = pybcsv.Layout()
layout.add_column("id", pybcsv.INT32)
layout.add_column("name", pybcsv.STRING)
layout.add_column("value", pybcsv.DOUBLE)

# Write rows (context manager auto-closes)
with pybcsv.Writer(layout) as writer:
    writer.open("data.bcsv")
    writer.write_row([1, "Alice", 123.45])
    writer.write_row([2, "Bob", 678.90])

# Read all rows
with pybcsv.Reader() as reader:
    reader.open("data.bcsv")
    for row in reader:          # iterator protocol
        print(row)
    # or: all_rows = reader.read_all()

Pandas Integration

import pybcsv
import pandas as pd

df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'value': [123.45, 678.90, 111.22]
})

# Write DataFrame (columnar path, numpy zero-copy for numerics)
pybcsv.write_dataframe(df, "data.bcsv")

# Read back as DataFrame
df_read = pybcsv.read_dataframe("data.bcsv")

CSV Conversion

import pybcsv

pybcsv.from_csv("input.csv", "output.bcsv")   # CSV → BCSV
pybcsv.to_csv("output.bcsv", "output.csv")    # BCSV → CSV

Polars Integration

Zero-copy Polars DataFrame I/O via the Arrow C Data Interface:

import pybcsv

# Read BCSV → Polars DataFrame (zero-copy via Arrow)
df = pybcsv.read_polars("data.bcsv")

# Write Polars DataFrame → BCSV
pybcsv.write_polars(df, "output.bcsv", row_codec="delta")

Install with the optional Polars dependency:

pip install pybcsv[polars]

Random Access

import pybcsv

with pybcsv.ReaderDirectAccess() as da:
    da.open("data.bcsv")
    print(f"Total rows: {len(da)}")
    row = da[42]         # read row 42 directly (O(1) seek)
    print(da.read(100))  # alternative syntax

Available Types

ConstantDescription
pybcsv.BOOLBoolean
pybcsv.INT8 / pybcsv.UINT88-bit integers
pybcsv.INT16 / pybcsv.UINT1616-bit integers
pybcsv.INT32 / pybcsv.UINT3232-bit integers
pybcsv.INT64 / pybcsv.UINT6464-bit integers
pybcsv.FLOAT32-bit float
pybcsv.DOUBLE64-bit float
pybcsv.STRINGVariable-length string

API Reference

Layout

layout = pybcsv.Layout()                              # empty layout
layout = pybcsv.Layout([ColumnDefinition("x", INT32)]) # from list

layout.add_column(name: str, type: ColumnType)
layout.add_column(col: ColumnDefinition)
layout.column_count() -> int
layout.column_name(index: int) -> str
layout.column_type(index: int) -> ColumnType
layout.has_column(name: str) -> bool
layout.column_index(name: str) -> int
layout.get_column_names() -> list[str]
layout.get_column_types() -> list[ColumnType]
layout.get_column(index: int) -> ColumnDefinition
len(layout)           # column count
layout[i]             # ColumnDefinition at index i

Writer

writer = pybcsv.Writer(layout: Layout, row_codec: str = "delta")
writer.open(filename: str, overwrite: bool = False,
            compression_level: int = 1, block_size_kb: int = 64,
            flags: FileFlags = FileFlags.BATCH_COMPRESS)  # raises RuntimeError on failure
writer.write_row(values: list)
writer.write_rows(rows: list[list])     # batch write
writer.flush()
writer.close()
writer.is_open() -> bool
writer.row_count() -> int
writer.row_codec() -> str
writer.compression_level() -> int
writer.layout() -> Layout

# Context manager
with pybcsv.Writer(layout) as w:
    w.open("out.bcsv")
    w.write_row([...])

Row codec options: "flat", "zoh" (zero-order hold), "delta" (default).

Reader

reader = pybcsv.Reader()
reader.open(filename: str)              # raises RuntimeError on failure
reader.read_next() -> bool              # advance to next row
reader.read_row() -> list | None        # read+advance, None at EOF
reader.read_all() -> list[list]         # read remaining rows
reader.close()
reader.is_open() -> bool
reader.layout() -> Layout
reader.row_pos() -> int                 # current row index
reader.row_value(column: int) -> Any    # typed value from current row
reader.row_dict() -> dict               # current row as {name: value}
reader.file_flags() -> FileFlags
reader.compression_level() -> int
reader.version_string() -> str
reader.creation_time() -> str
reader.count_rows() -> int              # total row count

# Iterator protocol
for row in reader:
    print(row)

# Context manager
with pybcsv.Reader() as r:
    r.open("data.bcsv")
    for row in r:
        print(row)

ReaderDirectAccess

Random-access reader — reads any row by index without scanning.

da = pybcsv.ReaderDirectAccess()
da.open(filename: str, rebuild_footer: bool = False)
da.read(index: int) -> list             # read row at index
da.row_count() -> int
da.layout() -> Layout
da.close()
da.is_open() -> bool
da.file_flags() -> FileFlags
da.compression_level() -> int
da.version_string() -> str
da.creation_time() -> str

len(da)               # row count
da[i]                 # read row at index i

CsvWriter / CsvReader

Native CSV I/O with the same Layout-based schema.

# Write CSV
csv_w = pybcsv.CsvWriter(layout, delimiter=',', decimal_sep='.')
csv_w.open(filename, overwrite=False, include_header=True)
csv_w.write_row(values)
csv_w.write_rows(rows)
csv_w.close()

# Read CSV
csv_r = pybcsv.CsvReader(layout, delimiter=',', decimal_sep='.')
csv_r.open(filename, has_header=True)
for row in csv_r:       # iterator support
    print(row)
csv_r.close()

Sampler

Bytecode VM for filtering and projecting rows from an open Reader.

reader = pybcsv.Reader()
reader.open("data.bcsv")

sampler = pybcsv.Sampler(reader)
sampler.set_conditional("col_a > 10")    # filter expression
sampler.set_selection("col_a, col_b")    # column projection

result = sampler.output_layout()         # SamplerCompileResult (bool-testable)
if result:
    for row in sampler:                  # iterate matching rows
        print(row)

FileFlags

pybcsv.FileFlags.NONE
pybcsv.FileFlags.ZERO_ORDER_HOLD
pybcsv.FileFlags.NO_FILE_INDEX
pybcsv.FileFlags.STREAM_MODE
pybcsv.FileFlags.BATCH_COMPRESS
pybcsv.FileFlags.DELTA_ENCODING

# Combinable with | and &
flags = pybcsv.FileFlags.BATCH_COMPRESS | pybcsv.FileFlags.NO_FILE_INDEX

Utility Functions

# Pandas integration (requires pandas)
pybcsv.write_dataframe(df, filename,
                       compression_level=1,
                       row_codec="delta",
                       type_hints=None)  # dict[str, ColumnType]
pybcsv.read_dataframe(filename, columns=None)  # -> pd.DataFrame

# CSV conversion (requires pandas)
pybcsv.from_csv(csv_file, bcsv_file, compression_level=1, type_hints=None)
pybcsv.to_csv(bcsv_file, csv_file)

# Columnar I/O (numpy arrays)
pybcsv.read_columns(filename) -> dict[str, np.ndarray | list[str]]
pybcsv.write_columns(filename, columns, col_order, col_types,
                     row_codec="delta", compression_level=1)
# Type utilities
pybcsv.type_to_string(column_type) -> str

Testing

pip install pybcsv[test]
python -m pytest tests/ -v

File Structure

python/
├── pybcsv/
│   ├── __init__.py           # Public API and exports
│   ├── __version__.py        # Version (setuptools-scm)
│   ├── bindings.cpp          # C++ nanobind bindings
│   └── pandas_utils.py       # Pandas/CSV integration
├── examples/
│   ├── basic_usage.py        # Core BCSV operations
│   ├── pandas_integration.py # DataFrame examples
│   ├── advanced_usage.py     # DirectAccess, Sampler, CSV, columnar I/O
│   └── performance_benchmark.py
├── tests/                    # 17 test modules (pytest)
├── benchmarks/               # Python benchmark runner
├── pyproject.toml
└── README.md

Known Limitations

  • Arrow string columns: 2 GB per batch. The Arrow C Data Interface uses utf8 format ("u") with int32 offsets, limiting the total byte size of any single string column within one batch to ~2 GB. An OverflowError is raised at runtime if this limit is exceeded. For most workloads this is not an issue. If you hit this limit, consider splitting data into smaller batches.

  • No native null/missing value support. BCSV is a fixed-width binary format without a null bitmap. When writing a pandas DataFrame with NaN/None values, they are coerced to zero, False, or empty string by default (with a warning). Use strict=True in write_dataframe() to reject NaN values instead.

Compatibility

  • Python: 3.11, 3.12, 3.13
  • Platforms: Linux (x86_64, ARM64), macOS (x86_64, ARM64), Windows (AMD64)
  • Compilers: GCC 13+, Clang 16+, MSVC 2022 17.4+, Apple Clang (Xcode 15.4+)
  • C++ Standard: C++20
  • Dependencies:
    • numpy >= 1.19.0 (required)
    • pandas >= 1.0.0 (optional — pip install pybcsv[pandas])

License

MIT — see LICENSE for details.

Publishing

Wheels are built automatically via GitHub Actions (cibuildwheel) and published using Trusted Publisher (OIDC) — no API tokens required.

  • TestPyPI: every push to main/master or version tags
  • PyPI: only on v* tags (e.g. git tag v1.4.0 && git push origin v1.4.0)
  • Trigger the publish workflow:
  • The workflow triggers on pushes to the release branch or via manual workflow_dispatch.
  • Install from TestPyPI for verification:
# in a fresh virtualenv
python -m venv venv && source venv/bin/activate
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pybcsv
python -c "import pybcsv; print(pybcsv.__version__)"

If the import and version check succeed the wheel is good for release.

Keywords

csv

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts