
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
@gitnaseem745/xtract
Advanced tools
Production-grade PDF-to-Excel extraction CLI and API for voter lists, family registers, and government documents.
A powerful PDF-to-Excel extraction tool designed specifically for converting Kruti Dev Unicode PDF documents (voter lists, family registers, etc.) into structured Excel spreadsheets.
raw for direct extraction, final for formatted outputXtract is designed to handle a wide variety of PDF-to-Excel conversion scenarios, particularly for Indian government documents and regional language content.
| Use Case | Description |
|---|---|
| Voter Lists | Convert PDF voter rolls to searchable Excel databases |
| Electoral Rolls | Digitize constituency-wise voter data for analysis |
| BPL/APL Lists | Process Below/Above Poverty Line beneficiary lists |
| Ration Card Data | Extract ration card holder information |
| Pension Records | Convert pension beneficiary PDFs to spreadsheets |
| Use Case | Description |
|---|---|
| Family Registers | Extract family-wise demographic data |
| Parivar Registers | Digitize traditional family record books |
| Census Data | Process population census PDF reports |
| Household Surveys | Convert survey PDFs to analyzable data |
| Use Case | Description |
|---|---|
| Gram Panchayat Records | Village-level administrative data |
| Ward-wise Lists | Municipal ward population data |
| Property Tax Records | Extract property owner information |
| Birth/Death Registers | Vital statistics digitization |
| Use Case | Description |
|---|---|
| Demographic Studies | Population analysis from government PDFs |
| Migration Patterns | Track family movement across regions |
| Social Research | Extract data for academic studies |
| Policy Analysis | Government scheme beneficiary analysis |
| Use Case | Description |
|---|---|
| Beneficiary Tracking | Track scheme beneficiaries |
| Community Mapping | Map family relationships in communities |
| Outreach Planning | Plan health/education outreach programs |
| Donation Management | Process donor/recipient lists |
| Use Case | Description |
|---|---|
| Customer Data Entry | Bulk convert customer PDFs to CRM-ready Excel |
| Survey Processing | Field survey data digitization |
| Report Conversion | Convert legacy PDF reports to editable format |
| Data Migration | Migrate PDF archives to database systems |
💡 Tip: Xtract works best with PDFs containing Kruti Dev Unicode text. For scanned documents, use OCR software first to convert images to text-based PDFs.
npm install -g xtract
npm install xtract
git clone <repository-url>
cd xtract
npm install
npm link # Makes 'xtract' command available globally
# Convert a PDF to Excel (final formatted output)
xtract -i families.pdf -o families.xlsx
# Extract raw data without formatting
xtract -i families.pdf -o raw-output.xlsx -m raw
xtract --input <pdf> --output <xlsx> [options]
| Option | Short | Description |
|---|---|---|
--input | -i | Path to input PDF file |
| Option | Short | Default | Description |
|---|---|---|---|
--output | -o | output.xlsx | Path to output Excel file |
--mode | -m | final | Extraction mode: raw or final |
--debug | — | false | Enable debug output for troubleshooting |
--birthplace | — | — | Value to populate in birthplace column |
--mobile | — | — | Extract and populate mobile numbers |
--population | — | — | Populate family data (relations, count) |
--populate | — | — | Comma-separated list: birthplace,mobile,population |
| Mode | Description | Use Case |
|---|---|---|
raw | Direct PDF text extraction to Excel | When you need unprocessed data for custom processing |
final | Formatted output with structured columns | Ready-to-use voter list / family register format |
Convert a PDF to a formatted Excel file:
xtract -i voter-list.pdf -o voter-list.xlsx
Extract raw data without any formatting:
xtract -i document.pdf -o raw-data.xlsx --mode raw
Populate a specific birthplace value in all records:
xtract -i families.pdf -o output.xlsx --birthplace "पीपली"
Extract with all post-processing options:
xtract -i families.pdf -o complete.xlsx \
--birthplace "मो हयातपुर" \
--mobile \
--population
Troubleshoot extraction issues:
xtract -i problem-file.pdf -o output.xlsx --debug
Combine multiple post-processing options:
xtract -i families.pdf -o out.xlsx --populate birthplace,mobile --birthplace "गाँव"
import { runPipeline, convertPdfToExcel, convertPdfToFinal } from 'xtract';
The easiest way to convert PDFs:
await runPipeline({
inputPdf: 'input.pdf',
outputExcel: 'output.xlsx',
mode: 'final', // 'raw' | 'final'
options: { debug: true }
});
For more control over the conversion process:
// Step 1: Convert PDF to raw Excel
await convertPdfToExcel('input.pdf', 'raw.xlsx');
// Step 2: Convert raw Excel to final formatted output
await convertPdfToFinal('raw.xlsx', 'final.xlsx', {
birthPlace: 'पीपली'
});
runPipeline(params)Orchestrates the full PDF-to-Excel conversion pipeline.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
inputPdf | string | ✅ | — | Path to input PDF file |
outputExcel | string | ❌ | output.xlsx | Path to output Excel file |
mode | string | ❌ | final | Extraction mode: raw or final |
options | object | ❌ | {} | Additional options (e.g., { debug: true }) |
Returns: Promise<void>
Example:
await runPipeline({
inputPdf: 'families.pdf',
outputExcel: 'families.xlsx',
mode: 'final',
options: { debug: false }
});
convertPdfToExcel(inputPdf, outputExcel, options)Converts PDF directly to Excel format (raw extraction).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
inputPdf | string | ✅ | Path to input PDF |
outputExcel | string | ✅ | Path to output Excel |
options | object | ❌ | Options like { debug: true } |
Returns: Promise<void>
convertPdfToFinal(rawExcel, outputExcel, options)Converts raw Excel to final formatted output.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
rawExcel | string | ✅ | Path to raw Excel file |
outputExcel | string | ✅ | Path to final output Excel |
options | object | ❌ | Options like { birthPlace: 'value' } |
Returns: Promise<void>
xtract/
├── bin/
│ └── xtract.js # CLI entry point
├── src/
│ ├── index.js # Public API exports
│ ├── pipeline/ # Core conversion logic
│ ├── extractors/ # PDF text extractors
│ ├── utils/ # Helper utilities
│ └── config/ # Configuration
├── package.json
└── README.md
1. "Cannot find module" error
npm install # Reinstall dependencies
2. Empty output file
--debug flag to see extraction details3. Garbled text in output
4. Permission denied
chmod +x bin/xtract.js # Make CLI executable
MIT © 2024
Contributions are welcome! Please feel free to submit a Pull Request.
FAQs
Production-grade PDF-to-Excel extraction CLI and API for voter lists, family registers, and government documents.
We found that @gitnaseem745/xtract demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.