Extract Indonesia area data from the raw sources to CSV.
This package was developed to ease and speed up the data processing stage of idn-area-data.
Prerequisite
- Node.js 18 or later
- npm 9 or later
Installation
idn-area-extractor can be installed in the global scope (if you'd like to have it available and use it on the whole system) or locally for a specific package (especially if you'd like to use it programmatically):
Install globally:
npm install -g idn-area-extractor
Install locally:
npm install idn-area-extractor
Usage
Access the manual with idnxtr --help
command:
USAGE
$ idnxtr [regencies|districts|islands|villages] </path/to/file.[pdf|txt]> [OPTIONS]
OPTIONS
-c, --compare Compare the extracted data with the latest data
-d, --destination=<path> Set the folder destination. Default: current working directory
-o, --output=<filename> Set a specific output file name without the file extension
-r, --range=<range> Extract specific PDF pages (e.g. 1-2,5,7-10)
-R, --save-raw Save the extracted raw data into .txt file (only works with PDF data)
--silent Disable all logs
EXAMPLE
$ idnxtr
$ idnxtr regencies ~/data/regencies.pdf
$ idnxtr regencies ~/data/regencies.pdf -r 1-2,5,7-10 -R
$ idnxtr regencies ~/data/regencies.pdf --range 1-2,5,7-10 --save-raw
$ idnxtr regencies ~/data/raw-regencies.txt
Interactive UI
Run idnxtr
without arguments to launch the interactive UI that guides you to extracting the data.
API
idn-area-extractor can be used programmatically by using the API documented below:
idnxtr(options)
Extract the data from the PDF file.
options
Required:
options.data
: Which kind of data should be extracted, either 'regencies', 'districts', 'islands', or 'villages'.options.filePath
: The path to the PDF or TXT file.
Optional:
options.compare
: Compare the extracted data with the latest data. Default: false
.options.destination
: The destination folder to save the CSV file. Default: process.cwd()
.options.output
: The output file name without the file extension. Default: options.data
.options.range
: Extract specific PDF pages (e.g. 1-2,5,7-10). If not set, all pages will extracted.options.saveRaw
: Save the extracted raw data into .txt file (only works with PDF data). Default: false
.options.silent
: Disable all logs. Default: false
.
Example
ESM
import idnxtr from 'idn-area-extractor';
(async () => {
await idnxtr({
data: 'regencies',
filePath: '/path/to/regencies.pdf',
compare: true,
destination: '/path/to/destination',
output: 'regencies',
range: '1-2,5,7-10',
saveRaw: true,
silent: true,
});
})();
CommonJS
For CommonJS user, you need to use dynamic import like this:
(async () => {
const {default: idnxtr} = await import('idn-area-extractor')
await idnxtr({
})
})()
Problem Reporting
We have different channels for each problem, please use them by following these conditions :
Reporting a Bug
To report a bug, please open a new issue following the guide.
Requesting a New Feature
If you have a new feature in mind, please open a new issue following the guide.
Asking a Question
If you have a question, you can search for answers in the GitHub Discussions Q&A category. If you don't find a relevant discussion already, you can open a new discussion.
Support This Project
Give a ⭐️ if this project helped you!
You can support this project by donating via GitHub Sponsor, Trakteer, or Saweria.