local-assets
CLI tool to extract local* stylesheets, images, scripts, fonts and other subresources (assets) from a HTML document, and optionally copy them into a directory.
*local here means resources under the same-origin and available in same directory as given HTML document.
Motivation
While creating an auto-publish GitHub Action for W3C specifications (See spec-prod), I wanted to find the minimal files needed by the specification. This ensures we do not deploy the specification when unrelated files (like metadata files, CI scripts etc.) get changed. We want to deploy only the main HTML file (the specification) and its dependencies (generally CSS files and images) to GitHub pages and/or https://w3.org.
Now, this works outside the W3C use-case also, so I created this tool as a CLI if other people also find it useful.
Usage
This tool is meant to be used as a CLI, although you can also import it as a regular Node.js module.
You can install this tool as a CLI:
npm install --global local-assets
yarn global add local-assets
Then, you can extract all local resources from index.html
and copy them to ../all-the-files/
directory as:
local-assets index.html -o ../all-the-files/
If you do not wish to copy the assets and just list them out (on stdout
):
local-assets index.html
For a more verbose output, set the VERBOSE
environment variable. This will log additional information on stderr
. The list of assets will be still be outputted to stdout
.
VERBOSE=1 local-assets index.html
Already got a Chromium based browser installed?
If you already have a Chromium based browser (Google Chrome, Microsoft Edge) installed, you can avoid re-downloading it by setting the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD
env variable before install. You would need to specify the location of your Chromium binary (PUPPETEER_EXECUTABLE_PATH
) during CLI usage though. So:
export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1
npm install --global local-assets
export PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome
local-assets index.html
How it works?
- Use puppeteer to open the HTML file/URL.
- Use of the
document.querySelectorAll
APIs to find all subresources, and process them (this is done using the subresource package). - Filter out cross-origin subresources from above.
- Filter out the resources not found in current directory.
- Copy all other resources.