Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
Easily create a microservice for generating PDFs using headless Chrome.
pdf-bot
is installed on a server and will receive URLs to turn into PDFs through its API or CLI. pdf-bot
will manage a queue of PDF jobs. Once a PDF job has run it will notify you using a webhook so you can fetch the API. pdf-bot
supports storing PDFs on S3 out of the box. Failed PDF generations and Webhook pings will be retried after a configurable decaying schedule.
pdf-bot
uses html-pdf-chrome
under the hood and supports all the settings that it supports. Major thanks to @westy92 for making this possible.
Imagine you have an app that creates invoices. You want to save those invoices as PDF. You install pdf-bot
on a server as an API. Your app server sends the URL of the invoice to the pdf-bot
server. A cronjob on the pdf-bot
server keeps checking for new jobs, generates a PDF using headless Chrome and sends the location back to the application server using a webhook.
$ npm install -g pdf-bot
$ pdf-bot install
Make sure the node path is in your $PATH
pdf-bot install
will prompt for some basic configurations and then create a storage folder where your database and pdf files will be saved.
pdf-bot
comes packaged with sensible defaults. At the very minimum you must have a config file in the same folder from which you are executing pdf-bot
with a storagePath
given. However, in reality what you probably want to do is use the pdf-bot install
command to generate a configuration file and then use an alias ALIAS pdf-bot = "pdf-bot -c /home/pdf-bot.config.js"
pdf-bot.config.js
var htmlPdf = require('html-pdf-chrome')
module.exports = {
api: {
token: 'crazy-secret'
},
generator: {
completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000) // 1 sec timeout
},
storagePath: 'storage'
}
$ pdf-bot -c ./pdf-bot.config.js push https://esbenp.github.io
See a full list of the available configuration options.
pdf-bot
is meant to be a microservice that runs a server to generate PDFs for you. That usually means you will send requests from your application server to the PDF server to request an url to be generated as a PDF. pdf-bot
will manage a queue and retry failed generations. Once a job is successfully generated a path to it will be sent back to your application server.
Let us check out the flow for an app that generates PDF invoices.
1. (App server): An invoice is created ----> Send URL to invoice to pdf-bot server
2. (pdf-bot server): Put the URL in the queue
3. (pdf-bot server): PDF is generated using headless Chrome
4. (pdf-bot server): (if failed try again using 1 min, 3 min, 10 min, 30 min, 60 min delay)
5. (pdf-bot server): Upload PDF to storage (e.g. Amazon S3)
6. (pdf-bot server): Send S3 location of PDF back to the app server
7. (App server): Receive S3 location of PDF -> Check signature sum matches for security
8. (App server): Handle PDF however you see fit (move it, download it, save it etc.)
You can send meta data to the pdf-bot
server that will be sent back to the application. This can help you identify what PDF you are receiving.
On your pdf-bot
server start by creating a config file pdf-bot.config.js
. You can see an example file here
pdf-bot.config.js
module.exports = {
api: {
port: 3000,
token: 'api-token'
},
storage: {
's3': createS3Config({
bucket: '',
accessKeyId: '',
region: '',
secretAccessKey: ''
})
},
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
As a minimum you should configure an access token for your API. This will be used to authenticate jobs sent to your pdf-bot
server. You also need to add a webhook
configuration to have pdf notifications sent back to your application server. You should add a secret
that will be used to generate a signature used to check that the request has not been tampered with during transfer.
Start your API using
pdf-bot -c ./pdf-bot.config.js api
This will start an express server that listens for new jobs on port 3000
.
pdf-bot
uses html-pdf-chrome which in turns uses chrome-launcher to launch chrome. You should check out those two resources on how to properly setup Chrome. However, with chrome-launcher
Chrome should be started automatically. Otherwise, html-pdf-chrome
has a small guide on how to have it running as a process using pm2
.
You can install chrome on Ubuntu using
sudo apt-get update && apt-get install chromium-browser
If you are testing things on OSX or similar, chrome-launcher
should be able to find and automatically startup Chrome for you.
In the examples folder there is a small example on how the application API could look. Basically, you just have to define an endpoint that will receive the webhook and check that the signature matches.
api.post('/hook', function (req, res) {
var signature = req.get('X-PDF-Signature', 'sha1=')
var bodyCrypted = require('crypto')
.createHmac('sha1', '12345')
.update(JSON.stringify(req.body))
.digest('hex')
if (bodyCrypted !== signature) {
res.status(401).send()
return
}
console.log('PDF webhook received', JSON.stringify(req.body))
res.status(204).send()
})
Follow the guide under production/
to see how to setup pdf-bot
using pm2
and nginx
We setup our crontab to continuously look for jobs that have not yet been completed.
* * * * * node $(npm bin -g)/pdf-bot -c ./pdf-bot.config.js shift:all >> /var/log/pdfbot.log 2>&1
* * * * * node $(npm bin -g)/pdf-bot -c ./pdf-bot.config.js ping:retry-failed >> /var/log/pdfbot.log 2>&1
Let us assume I want to generate a PDF for https://esbenp.github.io
. I can add the job using the pdf-bot
CLI.
$ pdf-bot -c ./pdf-bot.config.js push https://esbenp.github.io --meta '{"id":1}'
Next, if my crontab is not setup to run it automatically I can run it using the shift:all
command
$ pdf-bot -c ./pdf-bot.config.js shift:all
This will look for the oldest uncompleted job and run it.
This is a common issue with PDF generation. Luckily, html-pdf-chrome
has a really awesome API for dealing with Javascript. You can specify a timeout in milliseconds, wait for elements or custom events. To add a wait simply configure the generator
key in your configuration. Below are a few examples.
Wait for 5 seconds
var htmlPdf = require('html-pdf-chrome')
module.exports = {
api: {
token: 'api-token'
},
// html-pdf-chrome options
generator: {
completionTrigger: new htmlPdf.CompletionTrigger.Timer(5000), // waits for 5 sec
},
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
Wait for event
var htmlPdf = require('html-pdf-chrome')
module.exports = {
api: {
token: 'api-token'
},
// html-pdf-chrome options
generator: {
completionTrigger: new htmlPdf.CompletionTrigger.Event(
'myEvent', // name of the event to listen for
'#myElement', // optional DOM element CSS selector to listen on, defaults to body
5000 // optional timeout (milliseconds)
)
},
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
In your Javascript trigger the event when rendering is complete
document.getElementById('myElement').dispatchEvent(new CustomEvent('myEvent'));
Wait for variable
var htmlPdf = require('html-pdf-chrome')
module.exports = {
api: {
token: 'api-token'
},
// html-pdf-chrome options
generator: {
completionTrigger: new htmlPdf.CompletionTrigger.Variable(
'myVarName', // optional, name of the variable to wait for. Defaults to 'htmlPdfDone'
5000 // optional, timeout (milliseconds)
)
},
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
In your Javascript set the variable when the rendering is complete
window.myVarName = true;
You can find more completion triggers in html-pdf-chrome's documentation
Below are given the endpoints that are exposed by pdf-server
's REST API
key | type | required | description |
---|---|---|---|
url | string | yes | The URL to generate a PDF from |
meta | object | Optional meta data object to send back to the webhook url |
curl -X POST -H 'Authorization: Bearer api-token' -H 'Content-Type: application/json' http://pdf-bot.com/ -d '
{
"url":"https://esbenp.github.io",
"meta":{
"type":"invoice",
"id":1
}
}'
If you have low conurrency (run a job every now and then) you can use the default database driver that uses LowDB.
var LowDB = require('pdf-bot/src/db/lowdb')
module.exports = {
api: {
token: 'api-token'
},
db: LowDB({
lowDbOptions: {},
path: '' // defaults to $storagePath/db/db.json
}),
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
var pgsql = require('pdf-bot/src/db/pgsql')
module.exports = {
api: {
token: 'api-token'
},
db: pgsql({
database: 'pdfbot',
username: 'pdfbot',
password: 'pdfbot',
port: 5432
}),
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
Optionally, you can specify a database url by specifying a connectionString
.
To install the necessary database tables, run db:migrate
. You can also destroy the database by running db:destroy
.
Currently pdf-bot
comes bundled with build-in support for storing PDFs on Amazon S3.
Feel free to contribute a PR if you want to see other storage plugins in pdf-bot
!
To install S3 storage add a key to the storage
configuration. Notice, you can add as many different locations you want by giving them different keys.
var createS3Config = require('pdf-bot/src/storage/s3')
module.exports = {
api: {
token: 'api-token'
},
storage: {
'my_s3': createS3Config({
bucket: '[YOUR BUCKET NAME]',
accessKeyId: '[YOUR ACCESS KEY ID]',
region: '[YOUR REGION]',
secretAccessKey: '[YOUR SECRET ACCESS KEY]'
})
},
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
var decaySchedule = [
1000 * 60, // 1 minute
1000 * 60 * 3, // 3 minutes
1000 * 60 * 10, // 10 minutes
1000 * 60 * 30, // 30 minutes
1000 * 60 * 60 // 1 hour
];
module.exports = {
// The settings of the API
api: {
// The port your express.js instance listens to requests from. (default: 3000)
port: 3000,
// Spawn command when a job has been pushed to the API
postPushCommand: ['/home/user/.npm-global/bin/pdf-bot', ['-c', './pdf-bot.config.js', 'shift:all']],
// The token used to validate requests to your API. Not required, but 100% recommended.
token: 'api-token'
},
db: LowDB(), // see other drivers under Database
// html-pdf-chrome
generator: {
// Triggers that specify when the PDF should be generated
completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000), // waits for 1 sec
// The port to listen for Chrome (default: 9222)
port: 9222
},
queue: {
// How frequent should pdf-bot retry failed generations?
// (default: 1 min, 3 min, 10 min, 30 min, 60 min)
generationRetryStrategy: function(job, retries) {
return decaySchedule[retries - 1] ? decaySchedule[retries - 1] : 0
},
// How many times should pdf-bot try to generate a PDF?
// (default: 5)
generationMaxTries: 5,
// How many generations to run at the same time when using shift:all
parallelism: 4,
// How frequent should pdf-bot retry failed webhook pings?
// (default: 1 min, 3 min, 10 min, 30 min, 60 min)
webhookRetryStrategy: function(job, retries) {
return decaySchedule[retries - 1] ? decaySchedule[retries - 1] : 0
},
// How many times should pdf-bot try to ping a webhook?
// (default: 5)
webhookMaxTries: 5
},
storage: {
's3': createS3Config({
bucket: '',
accessKeyId: '',
region: '',
secretAccessKey: ''
})
},
webhook: {
// The prefix to add to all pdf-bot headers on the webhook response.
// I.e. X-PDF-Transaction and X-PDF-Signature. (default: X-PDF-)
headerNamespace: 'X-PDF-',
// Extra request options to add to the Webhook ping.
requestOptions: {
},
// The secret used to generate the hmac-sha1 signature hash.
// !Not required, but should definitely be included!
secret: '1234',
// The endpoint to send PDF messages to.
url: 'http://localhost:3000/webhooks/pdf'
}
}
pdf-bot
comes with a full CLI included! Use -c
to pass a configuration to pdf-bot
. You can also use --help
to get a list of all commands. An example is given below.
$ pdf-bot.js --config ./examples/pdf-bot.config.js --help
Usage: pdf-bot [options] [command]
Options:
-V, --version output the version number
-c, --config <path> Path to configuration file
-h, --help output usage information
Commands:
api Start the API
db:migrate
db:destroy
install
generate [jobID] Generate PDF for job
jobs [options] List all completed jobs
ping [jobID] Attempt to ping webhook for job
ping:retry-failed
pings [jobId] List pings for a job
purge [options] Will remove all completed jobs
push [options] [url] Push new job to the queue
shift Run the next job in the queue
shift:all Run all unfinished jobs in the queue
pdf-bot
uses debug
for debug messages. You can turn on debugging by setting the environment variable DEBUG=pdf:*
like so
DEBUG=pdf:* pdf-bot jobs
$ npm run test
Please report issues to the issue tracker
The MIT License (MIT). Please see License File for more information.
FAQs
A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
We found that pdf-bot demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.