Socket
Book a DemoInstallSign in
Socket

json-parquet-merger

Package Overview
Dependencies
Maintainers
0
Versions
15
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

json-parquet-merger

TypeScript CLI tool for merging multiple JSON files of the same format into a single Parquet file

0.3.5
latest
npmnpm
Version published
Weekly downloads
21
133.33%
Maintainers
0
Weekly downloads
 
Created
Source

json-parquet-merger

A TypeScript CLI tool that merges multiple JSON files of the same format into a single Parquet file.

Features

  • 🚀 Fast processing and batch processing support
  • 📊 Automatic schema inference from all input files
  • ✅ Schema validation option
  • 🔍 File pattern filtering with regex support
  • 📝 Detailed progress display with real-time feedback
  • 🛠️ Customizable batch size
  • 🗜️ Multiple compression options (uncompressed, gzip, snappy, brotli)
  • 📁 Support for both single files and directory processing
  • 🔄 Robust error handling and file validation

Installation

# Install globally
npm install -g json-parquet-merger

# Or install locally with pnpm (recommended for development)
pnpm install json-parquet-merger

# Or using npm
npm install json-parquet-merger

Usage

Basic usage

# Merge all JSON files in the directory
json-parquet-merger -i ./data -o output.parquet

# Convert a single JSON file
json-parquet-merger -i data.json -o output.parquet

Options

json-parquet-merger [options]

Options:
  -i, --input <path>           Input directory or file path (required)
  -o, --output <path>          Output Parquet file path (required)
  -p, --pattern <regex>        Regular expression for filtering JSON files
  --validate                   Verify that all records have the same schema
  -b, --batch-size <number>    Batch size for processing records (default: 1000)
  -c, --compression <type>     Compression type: uncompressed, gzip, snappy, brotli (default: uncompressed)
  -V, --version                Display version number
  -h, --help                   Display help information

Usage examples

# Pattern filtering
json-parquet-merger -i ./data -o users.parquet -p "user_.*\\.json"

# Run with schema validation enabled
json-parquet-merger -i ./data -o output.parquet --validate

# Custom batch size
json-parquet-merger -i ./data -o output.parquet -b 5000

# With compression
json-parquet-merger -i ./data -o output.parquet -c gzip

# Combined options
json-parquet-merger -i ./data -o output.parquet -p "user_.*\\.json" --validate -b 2000 -c snappy

# Display help
json-parquet-merger --help

# Show version
json-parquet-merger --version

Input JSON file requirements

  • JSON files can contain data with any schema structure
  • Schema is automatically inferred from all input files (not just the first one)
  • JSON files can be written in array format or single object format
  • Missing fields in some files are automatically handled (marked as optional)
  • Nested objects and arrays are automatically converted to JSON strings
  • All field types are auto-detected: string, number (int64/double), boolean, timestamps
  • Complex objects/arrays are serialized to JSON strings for storage

Sample JSON File

user1.json:

[
  {
    "id": 1,
    "name": "Alice",
    "email": "alice@example.com",
    "age": 30,
    "active": true,
    "created_at": "2024-01-15T10:00:00Z"
  },
  {
    "id": 2,
    "name": "Bob",
    "email": "bob@example.com",
    "age": 25,
    "active": false,
    "created_at": "2024-01-16T10:00:00Z"
  }
]

user2.json:

[
  {
    "id": 3,
    "name": "Charlie",
    "email": "charlie@example.com",
    "age": 35,
    "active": true,
    "created_at": "2024-01-17T10:00:00Z"
  }
]

Schema Inference

Supported Data Types

JSON TypeParquet TypeNotes
stringUTF8Text data
integerINT64Whole numbers
floatDOUBLEDecimal numbers
booleanBOOLEANtrue/false values
DateTIMESTAMP_MILLISDate objects
object/arrayUTF8Serialized as JSON strings

Compression Options

The tool supports multiple compression algorithms:

  • uncompressed (default): No compression, fastest processing
  • gzip: Good compression ratio, moderate speed
  • snappy: Fast compression with reasonable ratio
  • brotli: Best compression ratio, slower processing

FAQs

Package last updated on 23 Jul 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.