
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
bankstatementparser
Advanced tools
BankStatementParser is your essential tool for easy bank statement management. Designed with finance and treasury experts in mind, it offers a simple way to handle CAMT (ISO 20022) formats and more. Get quick, accurate insights from your financial data and spend less time on processing. It's the smart, hassle-free way to stay on top of your transactions.
Parse bank statements across six formats — CAMT, PAIN.001, CSV, OFX/QFX, and MT940 — into structured DataFrames. Process ZIP archives safely. Redact PII by default. Stream files of any size.
Built for finance teams, treasury analysts, and fintech developers who need reliable, auditable extraction from ISO 20022 and legacy banking formats without sending data to external services.
| Feature | Description |
|---|---|
| 6 formats | CAMT.053, PAIN.001, CSV, OFX, QFX, MT940 |
| Auto-detection | detect_statement_format() identifies the format; create_parser() returns the right parser |
| Deduplication | Deduplicator detects exact duplicates and suspected matches across sources with explainable confidence scores |
| PII redaction | Names, IBANs, and addresses masked by default — opt in with --show-pii |
| Streaming | parse_streaming() at 27,000+ tx/s (CAMT) and 52,000+ tx/s (PAIN.001) with bounded memory |
| Parallel | parse_files_parallel() for multi-file batch processing across CPU cores |
| Secure ZIP | iter_secure_xml_entries() rejects zip bombs, encrypted entries, and suspicious compression ratios |
| In-memory parsing | from_string() and from_bytes() parse XML without touching disk |
| Export | CSV, JSON, Excel (.xlsx), and optional Polars DataFrames |
| 100% coverage | 467 tests, 100% branch coverage, property-based fuzzing with Hypothesis |
pip install bankstatementparser
Clone and install on macOS, Linux, or WSL:
git clone https://github.com/sebastienrousseau/bankstatementparser.git
cd bankstatementparser
python3 -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install --with dev
from bankstatementparser import CamtParser
parser = CamtParser("statement.xml")
transactions = parser.parse()
print(transactions)
Amount Currency DrCr Debtor Creditor ValDt AccountId
105678.5 SEK CRDT MUELLER 2010-10-18 50000000054910
-200000.0 SEK DBIT 2010-10-18 50000000054910
30000.0 SEK CRDT 2010-10-18 50000000054910
from bankstatementparser import Pain001Parser
parser = Pain001Parser("payment.xml")
payments = parser.parse()
print(payments)
PmtInfId PmtMtd InstdAmt Currency CdtrNm EndToEndId
PMT-001 TRF 1500.00 EUR ACME Corp E2E-001
PMT-001 TRF 2300.50 EUR Global Ltd E2E-002
from bankstatementparser import create_parser, detect_statement_format
fmt = detect_statement_format("transactions.ofx")
parser = create_parser("transactions.ofx", fmt)
records = parser.parse()
Works with .xml, .csv, .ofx, .qfx, and .mt940 files.
from bankstatementparser import CamtParser
xml_bytes = download_from_sftp() # your own function
parser = CamtParser.from_bytes(xml_bytes, source_name="daily.xml")
transactions = parser.parse()
Pass only decompressed XML to from_string() or from_bytes(). For ZIP archives, use iter_secure_xml_entries().
from bankstatementparser import CamtParser, iter_secure_xml_entries
for entry in iter_secure_xml_entries("statements.zip"):
parser = CamtParser.from_bytes(entry.xml_bytes, source_name=entry.source_name)
transactions = parser.parse()
print(entry.source_name, len(transactions), "transactions")
The iterator enforces size limits, blocks encrypted entries, and rejects suspicious compression ratios before any XML parsing occurs.
PII (names, IBANs, addresses) is redacted by default in console output and streaming mode.
# Redacted by default
for tx in parser.parse_streaming(redact_pii=True):
print(tx) # Names and addresses show as ***REDACTED***
# Opt in to see full data
for tx in parser.parse_streaming(redact_pii=False):
print(tx)
File exports (CSV, JSON, Excel) always contain the full unredacted data.
Process large files incrementally. Memory stays bounded regardless of file size — tested at 50,000 transactions with sub-2x memory scaling.
from bankstatementparser import CamtParser
parser = CamtParser("large_statement.xml")
for transaction in parser.parse_streaming():
process(transaction) # each transaction is a dict
Works with both CamtParser and Pain001Parser. PAIN.001 files over 50 MB use chunk-based namespace stripping via a temporary file — the full document is never loaded into memory.
| Metric | CAMT | PAIN.001 |
|---|---|---|
| Throughput | 27,000+ tx/s | 52,000+ tx/s |
| Per-transaction latency | 37 us | 19 us |
| Time to first result | < 1 ms | < 2 ms |
| Memory scaling | Constant (1K–50K) | Constant (1K–50K) |
Performance is flat from 1,000 to 50,000 transactions. CI enforces minimum TPS and latency thresholds.
Process multiple files simultaneously across CPU cores:
from bankstatementparser import parse_files_parallel
results = parse_files_parallel([
"statements/jan.xml",
"statements/feb.xml",
"statements/mar.xml",
])
for r in results:
print(r.path, r.status, len(r.transactions), "rows")
Uses ProcessPoolExecutor to bypass the GIL. Each file is parsed in its own worker process. Auto-detects format per file, or force with format_name="camt".
# Parse and display
python -m bankstatementparser.cli --type camt --input statement.xml
# Export to CSV
python -m bankstatementparser.cli --type camt --input statement.xml --output transactions.csv
# Stream with PII visible
python -m bankstatementparser.cli --type camt --input statement.xml --streaming --show-pii
Supports --type camt and --type pain001.
Detect duplicate transactions across multiple sources:
from bankstatementparser import CamtParser, Deduplicator
parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))
print(f"Unique: {len(result.unique_transactions)}")
print(f"Exact duplicates: {len(result.exact_duplicates)}")
print(f"Suspected matches: {len(result.suspected_matches)}")
The Deduplicator uses deterministic hashing for exact matches and configurable similarity thresholds for suspected matches. Each match group includes a confidence score and reason for auditability.
parser = CamtParser("statement.xml")
parser.parse()
# CSV
parser.export_csv("output.csv")
# JSON (includes summary + transactions)
parser.export_json("output.json")
# Excel
parser.camt_to_excel("output.xlsx")
Convert any parser output to a Polars DataFrame:
polars_df = parser.to_polars()
lazy_df = parser.to_polars_lazy()
Install with pip install bankstatementparser[polars].
See examples/ for 14 runnable scripts:
| Example | What it demonstrates |
|---|---|
parse_camt_basic.py | Load a CAMT.053 file and print transactions |
parse_camt_from_string.py | Parse CAMT from an in-memory XML string |
inspect_camt.py | Extract balances, stats, and summaries |
export_camt.py | Export to CSV and JSON |
export_camt_excel.py | Export to Excel workbook |
stream_camt.py | Stream transactions incrementally |
parse_camt_zip.py | Secure ZIP archive processing |
parse_detected_formats.py | Auto-detect CSV, OFX, MT940, and XML formats |
parse_pain001_basic.py | Parse a PAIN.001 payment file |
export_pain001.py | Export PAIN.001 to CSV and JSON |
stream_pain001.py | Stream payments incrementally |
validate_input.py | Validate file paths with InputValidator |
compatibility_wrappers.py | Legacy API wrappers |
cli_examples.sh | CLI commands for CAMT and PAIN.001 |
See docs/MAPPING.md for a complete reference of ISO 20022 XML tags to DataFrame columns across all six formats. Use this when integrating with ERP systems or building reconciliation pipelines.
bankstatementparser/ Source code (13 modules, 100% branch coverage)
docs/compliance/ ISO 13485 validation, risk register, traceability
examples/ 14 runnable example scripts
scripts/ SBOM generation, checksums, signature verification
tests/ 467 tests (unit, integration, property-based, security)
Bank statement files contain sensitive financial and personal data. This library is designed with security as a primary constraint:
resolve_entities=False, no_network=True, load_dtd=FalseFor vulnerability reports, see SECURITY.md.
For the full compliance suite, see docs/compliance/.
Run the full validation suite locally:
ruff check bankstatementparser tests examples scripts
python -m mypy bankstatementparser
python -m pytest
bandit -r bankstatementparser examples scripts -q
Signed commits required. See CONTRIBUTING.md.
Apache License 2.0. See LICENSE.
What formats are supported? CAMT.053, PAIN.001, CSV, OFX, QFX, and MT940.
Does any data leave my infrastructure?
No. Zero network calls. XML parsers enforce no_network=True. No cloud, no telemetry.
Is PII redacted automatically? Yes. Names, IBANs, and addresses are masked by default in console output and streaming. File exports retain full data.
Is the extraction deterministic? Yes. Same input produces byte-identical output. Critical for financial auditing.
Can it handle large files?
Yes. parse_streaming() is tested at 50,000 transactions (~25 MB) with bounded memory. Files over 50 MB use chunk-based streaming.
See FAQ.md for the complete FAQ covering data privacy, technical specs, and treasury workflows.
THE ARCHITECT ᛫ Sebastien Rousseau ᛫ https://sebastienrousseau.com THE ENGINE ᛞ EUXIS ᛫ Enterprise Unified Execution Intelligence System ᛫ https://euxis.co
FAQs
BankStatementParser is your essential tool for easy bank statement management. Designed with finance and treasury experts in mind, it offers a simple way to handle CAMT (ISO 20022) formats and more. Get quick, accurate insights from your financial data and spend less time on processing. It's the smart, hassle-free way to stay on top of your transactions.
We found that bankstatementparser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.