New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

@gitnaseem745/xtract

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install
Package was removed
Sorry, it seems this package was removed from the registry

@gitnaseem745/xtract

Production-grade PDF-to-Excel extraction CLI and API for voter lists, family registers, and government documents.

latest
Source
npmnpm
Version
1.0.0
Version published
Maintainers
1
Created
Source

Xtract 📄➡️📊

A powerful PDF-to-Excel extraction tool designed specifically for converting Kruti Dev Unicode PDF documents (voter lists, family registers, etc.) into structured Excel spreadsheets.

✨ Features

  • PDF to Excel Conversion — Extract tabular data from PDF files into clean Excel format
  • Two Extraction Modesraw for direct extraction, final for formatted output
  • Post-Processing Options — Automatically populate birthplace, mobile numbers, and family data
  • CLI & Programmatic API — Use from command line or integrate into your Node.js projects
  • Debug Mode — Troubleshoot extractions with detailed logging

🎯 Use Cases

Xtract is designed to handle a wide variety of PDF-to-Excel conversion scenarios, particularly for Indian government documents and regional language content.

🗳️ Electoral & Government Records

Use CaseDescription
Voter ListsConvert PDF voter rolls to searchable Excel databases
Electoral RollsDigitize constituency-wise voter data for analysis
BPL/APL ListsProcess Below/Above Poverty Line beneficiary lists
Ration Card DataExtract ration card holder information
Pension RecordsConvert pension beneficiary PDFs to spreadsheets

👨‍👩‍👧‍👦 Family & Census Data

Use CaseDescription
Family RegistersExtract family-wise demographic data
Parivar RegistersDigitize traditional family record books
Census DataProcess population census PDF reports
Household SurveysConvert survey PDFs to analyzable data

🏛️ Panchayat & Local Bodies

Use CaseDescription
Gram Panchayat RecordsVillage-level administrative data
Ward-wise ListsMunicipal ward population data
Property Tax RecordsExtract property owner information
Birth/Death RegistersVital statistics digitization

🔬 Research & Analysis

Use CaseDescription
Demographic StudiesPopulation analysis from government PDFs
Migration PatternsTrack family movement across regions
Social ResearchExtract data for academic studies
Policy AnalysisGovernment scheme beneficiary analysis

🏢 Organizations & NGOs

Use CaseDescription
Beneficiary TrackingTrack scheme beneficiaries
Community MappingMap family relationships in communities
Outreach PlanningPlan health/education outreach programs
Donation ManagementProcess donor/recipient lists

📊 Business Applications

Use CaseDescription
Customer Data EntryBulk convert customer PDFs to CRM-ready Excel
Survey ProcessingField survey data digitization
Report ConversionConvert legacy PDF reports to editable format
Data MigrationMigrate PDF archives to database systems

💡 Tip: Xtract works best with PDFs containing Kruti Dev Unicode text. For scanned documents, use OCR software first to convert images to text-based PDFs.

📦 Installation

npm install -g xtract

Local Installation (For programmatic use)

npm install xtract

From Source

git clone <repository-url>
cd xtract
npm install
npm link  # Makes 'xtract' command available globally

🚀 Quick Start

Basic Usage

# Convert a PDF to Excel (final formatted output)
xtract -i families.pdf -o families.xlsx

# Extract raw data without formatting
xtract -i families.pdf -o raw-output.xlsx -m raw

📖 CLI Reference

xtract --input <pdf> --output <xlsx> [options]

Required Options

OptionShortDescription
--input-iPath to input PDF file

Optional Options

OptionShortDefaultDescription
--output-ooutput.xlsxPath to output Excel file
--mode-mfinalExtraction mode: raw or final
--debugfalseEnable debug output for troubleshooting
--birthplaceValue to populate in birthplace column
--mobileExtract and populate mobile numbers
--populationPopulate family data (relations, count)
--populateComma-separated list: birthplace,mobile,population

Extraction Modes Explained

ModeDescriptionUse Case
rawDirect PDF text extraction to ExcelWhen you need unprocessed data for custom processing
finalFormatted output with structured columnsReady-to-use voter list / family register format

📋 Examples

1. Basic Conversion

Convert a PDF to a formatted Excel file:

xtract -i voter-list.pdf -o voter-list.xlsx

2. Raw Extraction

Extract raw data without any formatting:

xtract -i document.pdf -o raw-data.xlsx --mode raw

3. With Birthplace Population

Populate a specific birthplace value in all records:

xtract -i families.pdf -o output.xlsx --birthplace "पीपली"

4. Full Processing Pipeline

Extract with all post-processing options:

xtract -i families.pdf -o complete.xlsx \
  --birthplace "मो हयातपुर" \
  --mobile \
  --population

5. Debug Mode

Troubleshoot extraction issues:

xtract -i problem-file.pdf -o output.xlsx --debug

6. Using --populate Flag

Combine multiple post-processing options:

xtract -i families.pdf -o out.xlsx --populate birthplace,mobile --birthplace "गाँव"

💻 Programmatic API

Import the Module

import { runPipeline, convertPdfToExcel, convertPdfToFinal } from 'xtract';

High-Level Pipeline

The easiest way to convert PDFs:

await runPipeline({
  inputPdf: 'input.pdf',
  outputExcel: 'output.xlsx',
  mode: 'final',           // 'raw' | 'final'
  options: { debug: true }
});

Low-Level Functions

For more control over the conversion process:

// Step 1: Convert PDF to raw Excel
await convertPdfToExcel('input.pdf', 'raw.xlsx');

// Step 2: Convert raw Excel to final formatted output
await convertPdfToFinal('raw.xlsx', 'final.xlsx', {
  birthPlace: 'पीपली'
});

📚 API Reference

runPipeline(params)

Orchestrates the full PDF-to-Excel conversion pipeline.

Parameters:

ParameterTypeRequiredDefaultDescription
inputPdfstringPath to input PDF file
outputExcelstringoutput.xlsxPath to output Excel file
modestringfinalExtraction mode: raw or final
optionsobject{}Additional options (e.g., { debug: true })

Returns: Promise<void>

Example:

await runPipeline({
  inputPdf: 'families.pdf',
  outputExcel: 'families.xlsx',
  mode: 'final',
  options: { debug: false }
});

convertPdfToExcel(inputPdf, outputExcel, options)

Converts PDF directly to Excel format (raw extraction).

Parameters:

ParameterTypeRequiredDescription
inputPdfstringPath to input PDF
outputExcelstringPath to output Excel
optionsobjectOptions like { debug: true }

Returns: Promise<void>

convertPdfToFinal(rawExcel, outputExcel, options)

Converts raw Excel to final formatted output.

Parameters:

ParameterTypeRequiredDescription
rawExcelstringPath to raw Excel file
outputExcelstringPath to final output Excel
optionsobjectOptions like { birthPlace: 'value' }

Returns: Promise<void>

🔧 Requirements

  • Node.js >= 18.0.0
  • npm >= 8.0.0

📁 Project Structure

xtract/
├── bin/
│   └── xtract.js          # CLI entry point
├── src/
│   ├── index.js           # Public API exports
│   ├── pipeline/          # Core conversion logic
│   ├── extractors/        # PDF text extractors
│   ├── utils/             # Helper utilities
│   └── config/            # Configuration
├── package.json
└── README.md

🐛 Troubleshooting

Common Issues

1. "Cannot find module" error

npm install  # Reinstall dependencies

2. Empty output file

  • Ensure your PDF contains extractable text (not scanned images)
  • Try --debug flag to see extraction details

3. Garbled text in output

  • The tool is optimized for Kruti Dev Unicode. Other fonts may not work correctly.

4. Permission denied

chmod +x bin/xtract.js  # Make CLI executable

📄 License

MIT © 2024

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Keywords

pdf

FAQs

Package last updated on 24 Jan 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts