OCSV - Odin CSV Parser
A high-performance, RFC 4180 compliant CSV parser written in Odin with Bun FFI support.

Platform Support:

Features
- ⚡ High Performance - Fast CSV parsing with SIMD optimizations
- 🦺 Memory Safe - Zero memory leaks, comprehensive testing
- ✅ RFC 4180 Compliant - Full CSV specification support
- 🌍 UTF-8 Support - Correct handling of international characters
- 🔧 Flexible Configuration - Custom delimiters, quotes, comments
- 📦 Bun Native - Direct FFI integration with Bun runtime
- 🛡️ Error Handling - Detailed error messages with line/column info
- 🎯 Schema Validation - Type checking, constraints, type conversion
- 🌊 Streaming API - Memory-efficient chunk-based processing
- 🔄 Transform System - Built-in transforms and pipelines
- 🔌 Plugin System - Extensible architecture for custom functionality
Why Odin + Bun?
Key Advantages:
- ✅ Simple build system (no node-gyp, no Python)
- ✅ Better memory safety (explicit memory management + defer)
- ✅ Better error handling (enums + multiple returns)
- ✅ No C++ wrapper needed (Bun FFI is direct)
Quick Start
npm Installation (Recommended)
Install OCSV as an npm package for easy integration with your Bun projects:
bun add ocsv
npm install ocsv
Then use it in your project:
import { parseCSV } from 'ocsv';
const result = parseCSV('name,age\nJohn,30\nJane,25', { hasHeader: true });
console.log(result.headers);
console.log(result.rows);
import { parseCSVFile } from 'ocsv';
const data = await parseCSVFile('./data.csv', { hasHeader: true });
console.log(`Parsed ${data.rowCount} rows`);
Manual Installation (Development)
For building from source or contributing:
git clone https://github.com/dvrd/ocsv.git
cd ocsv
Build
Current Support: macOS ARM64 (cross-platform support in progress)
task build
task build-dev
task test
task info
odin build src -build-mode:shared -out:libocsv.dylib -o:speed
Basic Usage (Odin)
package main
import "core:fmt"
import ocsv "src"
main :: proc() {
// Create parser
parser := ocsv.parser_create()
defer ocsv.parser_destroy(parser)
// Parse CSV data
csv_data := "name,age,city\nAlice,30,NYC\nBob,25,SF\n"
ok := ocsv.parse_csv(parser, csv_data)
if ok {
// Access parsed data
fmt.printfln("Parsed %d rows", len(parser.all_rows))
for row in parser.all_rows {
for field in row {
fmt.printf("%s ", field)
}
fmt.printf("\n")
}
}
}
Bun API Examples
Basic Parsing
import { parseCSV } from 'ocsv';
const result = parseCSV('name,age,city\nAlice,30,NYC\nBob,25,SF', {
hasHeader: true
});
console.log(result.headers);
console.log(result.rows);
console.log(result.rowCount);
Parse from File
import { parseCSVFile } from 'ocsv';
const data = await parseCSVFile('./sales.csv', {
hasHeader: true,
delimiter: ',',
});
console.log(`Parsed ${data.rowCount} rows`);
console.log(`Columns: ${data.headers.join(', ')}`);
for (const row of data.rows) {
console.log(row);
}
Custom Configuration
import { parseCSV } from 'ocsv';
const tsvData = parseCSV('col1\tcol2\trow1\tdata', {
delimiter: '\t',
hasHeader: true,
});
const europeanData = parseCSV('name;age;city\nJohn;30;Paris', {
delimiter: ';',
hasHeader: true,
});
const relaxedData = parseCSV('messy,csv,"data', {
relaxed: true,
});
Manual Parser Management
For more control, use the Parser class directly:
import { Parser } from 'ocsv';
const parser = new Parser();
try {
const result = parser.parse('a,b,c\n1,2,3');
console.log(result.rows);
} finally {
parser.destroy();
}
Performance Modes
OCSV offers two access modes to optimize for different use cases:
Mode Comparison
| Performance | ~8 MB/s throughput | ≥180 MB/s (22x faster) |
| Memory Usage | High (all data in JS) | Low (<200 MB for 10M rows) |
| Parse Time (10M rows) | ~150s | <7s (21x faster) |
| Access Pattern | Random access, arrays | Random access, on-demand |
| Memory Management | Automatic (GC) | Manual (destroy() required) |
| Best For | Small files, full iteration | Large files, selective access |
| TypeScript Support | Full | Full (discriminated unions) |
Eager Mode (Default)
Best for: Small to medium files (<100k rows), full dataset iteration, simple workflows
All rows are materialized into JavaScript arrays immediately. Easy to use, no cleanup required.
import { parseCSV } from 'ocsv';
const result = parseCSV(data, { hasHeader: true });
console.log(result.headers);
console.log(result.rows);
console.log(result.rowCount);
result.rows.forEach(row => console.log(row));
result.rows.map(row => row[0]);
result.rows.filter(row => row[1] > '25');
Pros:
- ✅ Simple API - standard JavaScript arrays
- ✅ No manual cleanup required
- ✅ Familiar array methods (map, filter, slice)
- ✅ Safe for GC-managed memory
Cons:
- ❌ Slower for large files (7.5x overhead)
- ❌ High memory usage (all rows in JS heap)
- ❌ Parse time proportional to data crossing FFI boundary
Lazy Mode (High Performance)
Best for: Large files (>1M rows), selective access, memory-constrained environments
Rows stay in native Odin memory and are accessed on-demand. Achieves near-FFI performance with minimal memory footprint.
import { parseCSV } from 'ocsv';
const result = parseCSV(data, {
mode: 'lazy',
hasHeader: true
});
try {
console.log(result.headers);
console.log(result.rowCount);
const row = result.getRow(5000000);
console.log(row.get(0));
console.log(row.get(1));
for (const field of row) {
console.log(field);
}
const arr = row.toArray();
for (const row of result.slice(1000, 2000)) {
console.log(row.get(0));
}
for (const row of result) {
console.log(row.get(0));
}
} finally {
result.destroy();
}
Pros:
- ✅ 22x faster parse time than eager mode
- ✅ Low memory footprint (<200 MB for 10M rows)
- ✅ LRU cache (1000 hot rows) for repeated access
- ✅ Generator-based slicing (memory efficient)
- ✅ Random access to any row (O(1) after cache)
Cons:
- ❌ Manual cleanup required (
destroy() must be called)
- ❌ Not standard arrays (use
.get(i) or .toArray())
- ❌ Use-after-destroy throws errors
When to Use Each Mode
Start
|
Is file size > 100MB or > 1M rows?
/ \
Yes No
| |
Do you need to Use Eager Mode
access all rows? (simple, safe)
/ \
No Yes
| |
Lazy Mode Memory constrained?
(fast, low / \
memory) Yes No
| |
Lazy Mode Try Eager first
(streaming) (measure, switch if slow)
Use Lazy Mode when:
- File size > 100 MB or > 1M rows
- You need selective row access (not full iteration)
- Memory is constrained (< 1 GB available)
- You're building streaming/ETL pipelines
- You need maximum parsing performance
Use Eager Mode when:
- File size < 100 MB or < 1M rows
- You need full dataset iteration
- You prefer simpler API (standard arrays)
- Memory cleanup must be automatic (GC)
- You're prototyping or writing quick scripts
Performance Benchmarks
Test Setup: 10M rows, 4 columns, 1.2 GB CSV file
Mode Parse Time Throughput Memory Usage
────────────────────────────────────────────────────────
FFI Direct 6.2s 193 MB/s 50 MB (baseline)
Lazy Mode 6.8s 176 MB/s <200 MB
Eager Mode 151.7s 7.9 MB/s ~8 GB
Key Metrics:
- Lazy mode is 22x faster than eager mode
- Lazy mode uses 40x less memory than eager mode
- Lazy mode is only 9% slower than raw FFI (acceptable overhead)
Advanced: High-Performance FFI Mode
For advanced users who need maximum FFI throughput, OCSV offers an optimized packed buffer mode that achieves 61.25 MB/s (56% of native Odin performance).
Performance Comparison (100K rows, 13.80 MB file):
Mode Throughput ns/row vs Native
──────────────────────────────────────────────────────
Native Odin 109.28 MB/s 915 100%
Packed Buffer 61.25 MB/s 2,253 56%
Bulk JSON 40.68 MB/s 2,878 37%
Field-by-Field 29.58 MB/s 3,957 27%
Optimizations:
- ⚡ 61.25 MB/s average throughput
- 🚀 Batched TextDecoder with reduced decoder overhead
- 💾 Pre-allocated arrays to reduce GC pressure
- 📊 SIMD-friendly memory access patterns
- 🔄 Adaptive processing for different row sizes
- 📦 Binary packed format with length-prefixed strings
- ✨ Single FFI call instead of multiple round-trips
Usage:
import { parseCSVPacked } from 'ocsv/bindings/simple';
const rows = parseCSVPacked(csvData);
When to use Packed Buffer:
- Need maximum FFI throughput (>40 MB/s)
- Willing to trade API simplicity for performance
- Working with medium-large files through Bun FFI
- Want to minimize cross-language boundary overhead
Note: The 44% overhead compared to native Odin is inherent to the FFI serialization boundary. This is the practical limit for JavaScript-based FFI approaches.
Memory Management
Eager Mode
const result = parseCSV(data);
Lazy Mode
const result = parseCSV(data, { mode: 'lazy' });
try {
} finally {
result.destroy();
}
Common Pitfalls:
❌ Forgetting to destroy:
const result = parseCSV(data, { mode: 'lazy' });
console.log(result.getRow(0));
❌ Use after destroy:
const result = parseCSV(data, { mode: 'lazy' });
result.destroy();
result.getRow(0);
✅ Correct pattern:
const result = parseCSV(data, { mode: 'lazy' });
try {
const row = result.getRow(0);
console.log(row.toArray());
} finally {
result.destroy();
}
TypeScript Support
OCSV provides discriminated union types for type-safe mode selection:
import { parseCSV } from 'ocsv';
const eager = parseCSV(data);
console.log(eager.rows[0]);
const lazy = parseCSV(data, { mode: 'lazy' });
console.log(lazy.getRow(0));
const wrong = parseCSV(data, { mode: 'lazy' });
console.log(wrong.rows);
Configuration
// Create parser with custom configuration
parser := ocsv.parser_create()
defer ocsv.parser_destroy(parser)
// TSV (Tab-Separated Values)
parser.config.delimiter = '\t'
// European CSV (semicolon)
parser.config.delimiter = ';'
// Comments (skip lines starting with #)
parser.config.comment = '#'
// Relaxed mode (handle malformed CSV)
parser.config.relaxed = true
// Custom quote character
parser.config.quote = '\''
RFC 4180 Compliance
OCSV fully implements RFC 4180 with support for:
- ✅ Quoted fields with embedded delimiters (
"field, with, commas")
- ✅ Nested quotes (
"field with ""quotes""" → field with "quotes")
- ✅ Multiline fields (newlines inside quotes)
- ✅ CRLF and LF line endings (Windows/Unix)
- ✅ Empty fields (consecutive delimiters:
a,,c)
- ✅ Trailing delimiters (
a,b, → 3 fields, last is empty)
- ✅ Leading delimiters (
,a,b → 3 fields, first is empty)
- ✅ Comments (extension: lines starting with
#)
- ✅ Unicode/UTF-8 (CJK characters, emojis, etc.)
Example:
# Sales data for Q1 2024
product,price,description,quantity
"Widget A",19.99,"A great widget, now with more features!",100
"Gadget B",29.99,"Essential gadget
Multi-line description",50
Testing
~201 tests, 100% pass rate, 0 memory leaks
odin test tests
odin test tests -debug
Test Suites
The project includes comprehensive test coverage across multiple suites:
- Basic functionality and core parsing operations
- RFC 4180 edge cases and compliance
- Integration tests for end-to-end workflows
- Schema validation and type checking
- Transform system and pipelines
- Plugin system functionality
- Streaming API with chunk boundaries
- Large file handling
- Performance regression monitoring
- Error handling and recovery strategies
- Property-based fuzzing tests
- Parallel processing capabilities
- SIMD optimization verification
Project Structure
ocsv/
├── src/
│ ├── ocsv.odin # Main module
│ ├── parser.odin # RFC 4180 state machine parser
│ ├── parser_simd.odin # SIMD-optimized parser
│ ├── parser_error.odin # Error-aware parser
│ ├── streaming.odin # Streaming API
│ ├── parallel.odin # Parallel processing
│ ├── transform.odin # Transform system
│ ├── plugin.odin # Plugin architecture
│ ├── simd.odin # SIMD search functions
│ ├── error.odin # Error handling system
│ ├── schema.odin # Schema validation & type system
│ ├── config.odin # Configuration types
│ └── ffi_bindings.odin # Bun FFI exports
├── tests/ # Comprehensive test suite
├── plugins/ # Example plugins
├── bindings/ # Bun/TypeScript bindings
├── benchmarks/ # Performance benchmarks
├── examples/ # Usage examples
└── README.md # This file
Requirements
- Odin: Latest version (tested with Odin dev-2025-01)
- Bun: v1.0+ (for FFI integration, optional)
- Platform: macOS ARM64 (cross-platform support in development)
- Task: v3+ (optional, for automated builds)
Release Process
This project uses automated releases via semantic-release. Releases are triggered automatically when changes are pushed to the main branch.
Commit Message Format
All commits must follow Conventional Commits:
<type>(<scope>): <subject>
<body>
<footer>
Examples:
git commit -m "feat: add streaming parser API"
git commit -m "fix: handle empty fields correctly"
git commit -m "docs: update installation instructions"
git commit -m "feat!: remove deprecated parseFile method
BREAKING CHANGE: parseFile has been removed, use parseCSVFile instead"
Commit Types:
feat: New feature (triggers minor version bump)
fix: Bug fix (triggers patch version bump)
perf: Performance improvement (triggers patch version bump)
docs: Documentation changes (no release)
chore: Maintenance tasks (no release)
refactor: Code refactoring (no release)
test: Test changes (no release)
ci: CI/CD changes (no release)
Version Bumps
- Patch (1.1.0 → 1.1.1):
fix:, perf:
- Minor (1.1.0 → 1.2.0):
feat:
- Major (1.1.0 → 2.0.0): Any commit with
BREAKING CHANGE: in footer or ! after type
Release Workflow
- Developer pushes commits to
main branch
- CI runs tests and builds
- semantic-release analyzes commits
- If releasable changes found:
- Determines new version number
- Updates CHANGELOG.md
- Updates package.json
- Creates git tag
- Publishes to npm with provenance
- Creates GitHub release with prebuilt binaries
Manual Release (Emergency Only):
npm run release:dry
git push origin main
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for detailed guidelines on commit messages and pull request process.
Development Workflow:
- Fork the repository
- Create a feature branch
- Make changes with tests (
odin test tests)
- Ensure zero memory leaks
- Submit a pull request
License
MIT License - see LICENSE for details.
Acknowledgments
Related Projects
- d3-dsv - Pure JavaScript CSV/DSV parser
- papaparse - Popular JavaScript CSV parser
- xsv - Rust CLI tool for CSV processing
- csv-parser - Node.js streaming CSV parser
Contact
Built with ❤️ using Odin + Bun
Version: 1.3.0
Last Updated: 2025-11-09