New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

jsonl-parse

Package Overview
Dependencies
Maintainers
1
Versions
8
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

jsonl-parse

A streaming JSON Lines parser for Node.js

latest
Source
npmnpm
Version
0.0.10
Version published
Weekly downloads
6
Maintainers
1
Weekly downloads
 
Created
Source

JSONLParse

A high-performance, memory-safe TypeScript/JavaScript streaming parser for JSONL (JSON Lines) files with extensive configuration options inspired by csv-parse. Included is a JSONL validator and converters to and from JSON and CSV.

Features

  • 🚀 High Performance: Native Node.js streams with minimal overhead
  • 🛡️ Memory Safe: Built-in protection against memory exhaustion
  • 📝 TypeScript Support: Full type definitions and interfaces
  • 🔧 Highly Configurable: Extensive options for data transformation and filtering
  • 🌍 Cross-Platform: Handles both Unix (\n) and Windows (\r\n) line endings
  • Streaming: Process large files without loading everything into memory
  • 🎯 Robust Error Handling: Multiple error handling strategies
  • 📊 Data Processing: Built-in casting, trimming, and transformation capabilities
  • 🔍 Flexible Filtering: Record and line-based filtering options
  • 🔄 Format Converters: Built-in converters between JSONL, JSON, and CSV formats

Installation

npm install jsonl-parse
# or
yarn add jsonl-parse

Quick Start

import { createReadStream } from 'node:fs'
import { JSONLParse } from 'jsonl-parse'

const parser = new JSONLParse()

createReadStream('data.jsonl')
  .pipe(parser)
  .on('data', (obj) => {
    console.log('Parsed object:', obj)
  })
  .on('error', (err) => {
    console.error('Parse error:', err.message)
  })
  .on('end', () => {
    console.log('Parsing complete!')
  })

API Reference

Constructor

new JSONLParse(options?: JSONLParseOptions)

JSONLParseOptions

Basic Options

OptionTypeDefaultDescription
strictbooleantrueIf true, stops on first invalid JSON. If false, skips invalid lines
reviverfunctionnullOptional reviver function passed to JSON.parse
skipEmptyLinesbooleantrueIf true, trims whitespace and skips empty lines
maxLineLengthnumberInfinityMaximum line length to prevent memory issues
encodingBufferEncoding'utf8'Encoding for chunk conversion

Column/Header Options

OptionTypeDefaultDescription
columnsstring[] | boolean | functionnullConvert arrays to objects. true uses first line as headers, array provides column names, function generates names

Record Filtering Options

OptionTypeDefaultDescription
fromnumbernullStart processing from record number (1-based)
tonumbernullStop processing at record number (1-based)
from_linenumbernullStart processing from line number (1-based)
to_linenumbernullStop processing at line number (1-based)

Data Transformation Options

OptionTypeDefaultDescription
castboolean | functionnullAuto-convert strings to native types or use custom function
cast_dateboolean | functionnullConvert date strings to Date objects
ltrimbooleanfalseLeft-trim whitespace from lines
rtrimbooleanfalseRight-trim whitespace from lines
trimbooleanfalseTrim whitespace from both ends of lines

Callback Options

OptionTypeDefaultDescription
on_recordfunctionnullTransform/filter each record. Return null to skip
on_skipfunctionnullCalled when records are skipped due to errors

Output Enhancement Options

OptionTypeDefaultDescription
infobooleanfalseInclude parsing metadata (line/record counts)
rawbooleanfalseInclude original line text
objnamestringnullCreate nested objects keyed by field value

Skip Options

OptionTypeDefaultDescription
skip_records_with_empty_valuesbooleanfalseSkip records where all values are empty
skip_records_with_errorbooleanfalseContinue processing when encountering invalid records

Usage Examples

Basic Usage

import { createReadStream } from 'node:fs'
import { JSONLParse } from 'jsonl-parse'

const parser = new JSONLParse()
createReadStream('data.jsonl').pipe(parser)

Array to Object Conversion with Headers

// Input: ["name","age","email"]
//        ["Alice",30,"alice@test.com"]
//        ["Bob",25,"bob@test.com"]

const parser = new JSONLParse({ columns: true })

// Output: {name: "Alice", age: 30, email: "alice@test.com"}
//         {name: "Bob", age: 25, email: "bob@test.com"}

Custom Column Names

const parser = new JSONLParse({
  columns: ['id', 'name', 'email']
})

// Converts arrays to objects with specified keys

Data Type Casting

const parser = new JSONLParse({
  cast: true, // Auto-convert strings to numbers, booleans, null
  cast_date: true, // Convert date strings to Date objects
})

// Input: {"age": "30", "active": "true", "created": "2023-01-01"}
// Output: {age: 30, active: true, created: Date object}

Record Filtering

const parser = new JSONLParse({
  from: 10, // Start from 10th record
  to: 100, // Stop at 100th record
  from_line: 5, // Start from 5th line
  to_line: 200 // Stop at 200th line
})

Custom Record Processing

const parser = new JSONLParse({
  on_record: (record, context) => {
    // Transform each record
    return {
      ...record,
      processed_at: new Date(),
      line_number: context.lines
    }
  }
})

Error Handling with Callbacks

const parser = new JSONLParse({
  strict: false,
  on_skip: (error, line) => {
    console.warn(`Skipped invalid line: ${line.slice(0, 50)}...`)
    console.warn(`Error: ${error.message}`)
  }
})

Enhanced Output with Metadata

const parser = new JSONLParse({
  info: true, // Include parsing metadata
  raw: true // Include original line text
})

// Output: {
//   info: { lines: 1, records: 1, invalid_field_length: 0 },
//   raw: '{"name": "Alice"}',
//   record: { name: "Alice" }
// }

Whitespace Handling

const parser = new JSONLParse({
  trim: true, // Trim both ends
  // or
  ltrim: true, // Left trim only
  rtrim: true, // Right trim only
})

Skip Empty Records

const parser = new JSONLParse({
  skip_records_with_empty_values: true // Skip records with all empty/null values
})

Nested Object Creation

const parser = new JSONLParse({
  objname: 'id' // Use 'id' field as object key
})

// Input: {"id": "user1", "name": "Alice"}
// Output: { user1: {"id": "user1", "name": "Alice"} }

Memory-Safe Processing

const safeParser = new JSONLParse({
  maxLineLength: 1024 * 1024, // 1MB per line maximum
  strict: false, // Skip overly long lines instead of erroring
  skip_records_with_error: true // Continue on any parsing errors
})

Complex Data Pipeline

import { createReadStream, createWriteStream } from 'node:fs'
import { Transform } from 'node:stream'
import { pipeline } from 'node:stream/promises'

const parser = new JSONLParse({
  columns: true, // First line as headers
  cast: true, // Auto-convert types
  cast_date: true, // Convert dates
  trim: true, // Trim whitespace
  from: 2, // Skip first data record
  skip_records_with_empty_values: true,
  on_record: (record) => {
    // Filter and transform
    if (record.status !== 'active')
      return null
    return { ...record, processed: true }
  },
  info: true // Include metadata
})

const processor = new Transform({
  objectMode: true,
  transform(data, encoding, callback) {
    // Access both metadata and record
    const { info, record } = data
    const output = {
      ...record,
      metadata: info,
      processed_at: new Date().toISOString()
    }
    callback(null, `${JSON.stringify(output)}\n`)
  }
})

await pipeline(
  createReadStream('input.jsonl'),
  parser,
  processor,
  createWriteStream('output.jsonl')
)

Async Iterator Usage

import { Readable } from 'node:stream'

const parser = new JSONLParse({
  cast: true,
  on_record: record => record.priority === 'high' ? record : null
})

const readable = Readable.from(createReadStream('data.jsonl').pipe(parser))

for await (const obj of readable) {
  console.log('High priority object:', obj)
  await processHighPriorityObject(obj)
}

JSONL Validator

JSONLParse includes a comprehensive validator for ensuring JSONL file integrity and schema compliance.

JSONLValidator - Validate JSONL Files

Validate JSONL files with comprehensive error reporting and optional schema validation.

import { createReadStream } from 'node:fs'
import { JSONLValidator } from 'jsonl-parse'

const validator = new JSONLValidator({
  strictMode: true,
  schema: {
    type: 'object',
    required: ['id', 'name'],
    properties: {
      id: { type: 'number', minimum: 1 },
      name: { type: 'string', minLength: 1 },
      email: { type: 'string', pattern: /^[^\s@]+@[^\s@][^\s.@]*\.[^\s@]+$/ }
    }
  }
})

createReadStream('data.jsonl')
  .pipe(validator)
  .on('data', (result) => {
    const validation = JSON.parse(result.toString())
    console.log(`Valid: ${validation.valid}`)
    console.log(`Total lines: ${validation.totalLines}`)
    console.log(`Valid lines: ${validation.validLines}`)
    console.log(`Invalid lines: ${validation.invalidLines}`)

    if (validation.errors.length > 0) {
      console.log('Errors:', validation.errors)
    }
  })

JSONLValidatorOptions

OptionTypeDefaultDescription
encodingBufferEncoding'utf8'Text encoding
maxLineLengthnumber1048576Maximum line length (1MB)
maxObjectsnumberInfinityMaximum objects to validate
strictModebooleanfalseStrict validation (no whitespace, newlines)
allowEmptyLinesbooleantrueAllow empty lines
schemaJSONLSchemanullJSON schema for validation

JSONLSchema Interface

interface JSONLSchema {
  type?: 'object' | 'array' | 'string' | 'number' | 'boolean' | 'null'
  required?: string[] // Required object properties
  properties?: Record<string, JSONLSchema> // Object property schemas
  items?: JSONLSchema // Array item schema
  minLength?: number // Minimum string/array length
  maxLength?: number // Maximum string/array length
  pattern?: RegExp // String pattern matching
  minimum?: number // Minimum numeric value
  maximum?: number // Maximum numeric value
  enum?: any[] // Allowed values
}

ValidationResult Interface

interface ValidationResult {
  valid: boolean // Overall validation result
  errors: ValidationError[] // List of validation errors
  totalLines: number // Total lines processed
  validLines: number // Number of valid lines
  invalidLines: number // Number of invalid lines
}

interface ValidationError {
  line: number // Line number (1-based)
  column?: number // Column position for JSON errors
  message: string // Error description
  value?: any // Invalid value
  schema?: JSONLSchema // Schema that failed
}

Validator Usage Examples

Basic Validation

import { validateJSONL } from 'jsonl-parse'

const jsonlData = `
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
invalid json line
{"id": 3, "name": "Charlie"}
`

const result = validateJSONL(jsonlData, {
  strictMode: false,
  allowEmptyLines: true
})

console.log(`${result.validLines}/${result.totalLines} lines valid`)
// Output: 3/4 lines valid

result.errors.forEach((error) => {
  console.log(`Line ${error.line}: ${error.message}`)
})
// Output: Line 3: Invalid JSON: Unexpected token i in JSON at position 0

Schema Validation

const userSchema = {
  type: 'object',
  required: ['id', 'name', 'email'],
  properties: {
    id: {
      type: 'number',
      minimum: 1
    },
    name: {
      type: 'string',
      minLength: 2,
      maxLength: 50
    },
    email: {
      type: 'string',
      pattern: /^[^\s@]+@[^\s@][^\s.@]*\.[^\s@]+$/
    },
    age: {
      type: 'number',
      minimum: 0,
      maximum: 150
    },
    status: {
      type: 'string',
      enum: ['active', 'inactive', 'pending']
    }
  }
}

const validator = new JSONLValidator({
  schema: userSchema,
  strictMode: true
})

// Will validate each line against the schema

Streaming Validation

import { createReadStream, createWriteStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'

const validator = new JSONLValidator({
  maxLineLength: 1024 * 10, // 10KB per line
  maxObjects: 10000, // Limit validation to 10k objects
  schema: {
    type: 'object',
    required: ['timestamp', 'level', 'message'],
    properties: {
      timestamp: { type: 'string', pattern: /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/ },
      level: { type: 'string', enum: ['debug', 'info', 'warn', 'error'] },
      message: { type: 'string', minLength: 1 },
      metadata: { type: 'object' }
    }
  }
})

await pipeline(
  createReadStream('logs.jsonl'),
  validator,
  createWriteStream('validation-report.json')
)

Strict Mode Validation

const strictValidator = new JSONLValidator({
  strictMode: true, // No whitespace, perfect formatting
  allowEmptyLines: false, // No empty lines allowed
  maxLineLength: 1000 // Reasonable line length limit
})

const result = validateJSONL('  {"valid": true}  \n', {
  strictMode: true
})

// Will report: "Line has leading or trailing whitespace"

Complex Schema Example

const apiResponseSchema = {
  type: 'object',
  required: ['status', 'data'],
  properties: {
    status: {
      type: 'string',
      enum: ['success', 'error']
    },
    data: {
      type: 'array',
      items: {
        type: 'object',
        required: ['id', 'attributes'],
        properties: {
          id: { type: 'string', pattern: /^[a-f0-9-]{36}$/ }, // UUID
          attributes: {
            type: 'object',
            properties: {
              name: { type: 'string', minLength: 1, maxLength: 100 },
              score: { type: 'number', minimum: 0, maximum: 100 },
              tags: {
                type: 'array',
                items: { type: 'string' }
              }
            }
          }
        }
      }
    },
    pagination: {
      type: 'object',
      properties: {
        page: { type: 'number', minimum: 1 },
        limit: { type: 'number', minimum: 1, maximum: 1000 },
        total: { type: 'number', minimum: 0 }
      }
    }
  }
}

const validator = new JSONLValidator({ schema: apiResponseSchema })

Error Analysis

const validator = new JSONLValidator({
  schema: {
    type: 'object',
    required: ['id'],
    properties: {
      id: { type: 'number' },
      email: { type: 'string', pattern: /^[^\s@]+@[^\s@][^\s.@]*\.[^\s@]+$/ }
    }
  }
})

createReadStream('users.jsonl')
  .pipe(validator)
  .on('data', (result) => {
    const validation = JSON.parse(result.toString())

    // Categorize errors
    const errorsByType = validation.errors.reduce((acc, error) => {
      const type = error.message.includes('JSON') ? 'syntax' : 'schema'
      acc[type] = (acc[type] || 0) + 1
      return acc
    }, {})

    console.log('Error breakdown:', errorsByType)

    // Find most common errors
    const errorCounts = {}
    validation.errors.forEach((error) => {
      errorCounts[error.message] = (errorCounts[error.message] || 0) + 1
    })

    const sortedErrors = Object.entries(errorCounts)
      .sort(([,a], [,b]) => b - a)
      .slice(0, 5)

    console.log('Top 5 errors:', sortedErrors)
  })

Validation with Processing Pipeline

import { JSONLParse } from 'jsonl-parse'

// Validate then process valid records
const validator = new JSONLValidator({
  schema: {
    type: 'object',
    required: ['id', 'email'],
    properties: {
      id: { type: 'number' },
      email: { type: 'string', pattern: /^[^\s@]+@[^\s@][^\s.@]*\.[^\s@]+$/ }
    }
  }
})

const processor = new JSONLParse({
  strict: false,
  on_record: (record, context) => {
    // Only process records that passed validation
    return {
      ...record,
      processed_at: new Date().toISOString(),
      line_number: context.lines
    }
  }
})

// First validate, then process if valid
validator.on('data', (validationResult) => {
  const result = JSON.parse(validationResult.toString())

  if (result.valid) {
    console.log('✅ Validation passed, processing records...')
    createReadStream('input.jsonl').pipe(processor)
  }
  else {
    console.error('❌ Validation failed:')
    result.errors.forEach((error) => {
      console.error(`  Line ${error.line}: ${error.message}`)
    })
  }
})

createReadStream('input.jsonl').pipe(validator)

Format Converters

JSONLParse includes several built-in converters for transforming between different data formats:

JSONToJSONL - Convert JSON to JSONL

Convert JSON files (arrays or objects) to JSONL format.

import { createReadStream, createWriteStream } from 'node:fs'
import { JSONToJSONL } from 'jsonl-parse'

const converter = new JSONToJSONL({
  arrayPath: 'data', // Extract array from nested path
  flatten: true, // Flatten nested objects
  maxObjectSize: 1024 * 1024 // 1MB per object limit
})

createReadStream('data.json')
  .pipe(converter)
  .pipe(createWriteStream('data.jsonl'))

JSONToJSONLOptions

OptionTypeDefaultDescription
arrayPathstringnullExtract array from nested object path (e.g., "data.items")
replacerfunctionnullJSON.stringify replacer function
encodingBufferEncoding'utf8'Text encoding
maxObjectSizenumberInfinityMaximum size per JSON object
flattenbooleanfalseFlatten nested objects to dot notation
rootKeystringnullWrap first object in specified key

JSONLToJSON - Convert JSONL to JSON

Convert JSONL files to JSON arrays or objects.

import { createReadStream, createWriteStream } from 'node:fs'
import { JSONLToJSON } from 'jsonl-parse'

const converter = new JSONLToJSON({
  arrayWrapper: true, // Wrap in array
  arrayName: 'results', // Use custom array name
  pretty: true, // Pretty print output
  space: 2 // Indentation spaces
})

createReadStream('data.jsonl')
  .pipe(converter)
  .pipe(createWriteStream('data.json'))

JSONLToJSONOptions

OptionTypeDefaultDescription
arrayWrapperbooleantrueWrap objects in array
arrayNamestringnullName for root array property
prettybooleanfalsePretty print JSON output
spacestring | number2Indentation for pretty printing
encodingBufferEncoding'utf8'Text encoding
maxObjectsnumberInfinityMaximum objects to process

JSONLToCSV - Convert JSONL to CSV

Convert JSONL files to CSV format with full customization.

import { createReadStream, createWriteStream } from 'node:fs'
import { JSONLToCSV } from 'jsonl-parse'

const converter = new JSONLToCSV({
  delimiter: ',',
  header: true,
  columns: ['id', 'name', 'email'], // Specific columns
  unflatten: true, // Reconstruct nested objects from flat keys
  cast: {
    boolean: value => value ? 'Yes' : 'No',
    date: value => value.toISOString(),
    number: value => value.toFixed(2)
  }
})

createReadStream('data.jsonl')
  .pipe(converter)
  .pipe(createWriteStream('data.csv'))

JSONLToCSVOptions

OptionTypeDefaultDescription
delimiterstring','Field delimiter
quotestring'"'Quote character
quotedbooleanfalseQuote all fields
quotedEmptybooleanfalseQuote empty fields
quotedStringbooleanfalseQuote string fields
escapestring'"'Escape character
headerbooleantrueInclude header row
columnsstring[] | functionnullColumn selection/ordering
encodingBufferEncoding'utf8'Text encoding
castobjectnullCustom type casting functions
unflattenbooleanfalseReconstruct nested objects
unflattenSeparatorstring'.'Separator for nested keys

CSVToJSONL - Convert CSV to JSONL

Convert CSV files to JSONL format with robust parsing.

import { createReadStream, createWriteStream } from 'node:fs'
import { CSVToJSONL } from 'jsonl-parse'

const converter = new CSVToJSONL({
  headers: true, // Use first row as headers
  delimiter: ',',
  cast: true, // Auto-convert types
  trim: true, // Trim whitespace
  skipEmptyLines: true,
  flatten: true, // Flatten objects to dot notation
  maxObjectSize: 1024 * 1024 // 1MB limit per object
})

createReadStream('data.csv')
  .pipe(converter)
  .pipe(createWriteStream('data.jsonl'))

CSVToJSONLOptions

OptionTypeDefaultDescription
delimiterstring','Field delimiter
quotestring'"'Quote character
escapestring'"'Escape character
headersboolean | string[]trueHeader handling
skipEmptyLinesbooleantrueSkip empty lines
skipRecordsWithEmptyValuesbooleanfalseSkip records with empty values
skipRecordsWithErrorbooleanfalseContinue on parse errors
replacerfunctionnullJSON.stringify replacer
encodingBufferEncoding'utf8'Text encoding
maxObjectSizenumberInfinityMaximum object size
flattenbooleanfalseFlatten nested objects
rootKeystringnullWrap objects in root key
trimbooleantrueTrim field values
castboolean | functionfalseType casting

Converter Usage Examples

Batch Processing Pipeline

import { createReadStream, createWriteStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { CSVToJSONL, JSONLParse, JSONLToCSV } from 'jsonl-parse'

// JSONL -> Process -> CSV
await pipeline(
  createReadStream('input.jsonl'),
  new JSONLParse({
    cast: true,
    on_record: record => ({
      ...record,
      processed: true,
      timestamp: new Date().toISOString()
    })
  }),
  new JSONLToCSV({ header: true }),
  createWriteStream('output.csv')
)

// CSV -> JSONL -> Process -> JSON
await pipeline(
  createReadStream('data.csv'),
  new CSVToJSONL({ cast: true }),
  new JSONLParse({
    on_record: record => record.active ? record : null
  }),
  new JSONLToJSON({ pretty: true }),
  createWriteStream('filtered.json')
)

Data Transformation Examples

// Convert nested JSON to flat JSONL
const jsonToFlat = new JSONToJSONL({
  arrayPath: 'users',
  flatten: true
})

// Convert flat JSONL back to nested CSV
const flatToNested = new JSONLToCSV({
  unflatten: true,
  unflattenSeparator: '.',
  columns: ['id', 'profile.name', 'profile.email', 'settings.theme']
})

// Round-trip conversion with processing
await pipeline(
  createReadStream('nested.json'),
  jsonToFlat,
  new JSONLParse({
    on_record: (record) => {
      // Process flat structure
      record['profile.verified'] = true
      return record
    }
  }),
  flatToNested,
  createWriteStream('processed.csv')
)

Memory-Safe Large File Processing

// Process large files with memory constraints
const safeConverter = new JSONLToCSV({
  maxObjectSize: 512 * 1024, // 512KB per object
  cast: {
    // Compress large text fields
    object: obj => JSON.stringify(obj).slice(0, 1000)
  }
})

const safeParser = new JSONLParse({
  maxLineLength: 1024 * 1024, // 1MB per line
  strict: false,
  skip_records_with_error: true,
  on_skip: (error, line) => {
    console.warn(`Skipped problematic record: ${error.message}`)
  }
})

await pipeline(
  createReadStream('large-file.jsonl'),
  safeParser,
  safeConverter,
  createWriteStream('safe-output.csv')
)

Error Handling

Strict Mode Errors

const parser = new JSONLParse({ strict: true })

parser.on('error', (err) => {
  if (err.message.includes('Invalid JSON at line')) {
    console.error('JSON parsing failed:', err.message)
  }
  else if (err.message.includes('Line length')) {
    console.error('Line too long:', err.message)
  }
  else if (err.message.includes('Buffer size exceeded')) {
    console.error('Memory limit exceeded:', err.message)
  }
})

Lenient Mode with Error Tracking

let errorCount = 0

const parser = new JSONLParse({
  strict: false,
  skip_records_with_error: true,
  on_skip: (error, line) => {
    errorCount++
    console.warn(`Error ${errorCount}: ${error.message}`)
    console.warn(`Problem line: ${line.slice(0, 100)}...`)
  }
})

parser.on('end', () => {
  console.log(`Processing complete. ${errorCount} errors encountered.`)
})

Converter Error Handling

const converter = new JSONLToCSV({
  cast: {
    date: (value) => {
      try {
        return new Date(value).toISOString()
      }
      catch {
        return 'Invalid Date'
      }
    }
  }
})

converter.on('error', (err) => {
  console.error('Conversion error:', err.message)
  // Handle converter-specific errors
})

License

MIT License

FAQs

Package last updated on 15 Aug 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts