
Product
Introducing Repository Access Permissions and Custom Roles
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.
mirror-website
Advanced tools
Mirrors websites in a useful way, with corrected links and local copies of assets.
Have you seen websites lately? Every single one is a special snowflake with unusual requirements if you are going to mirror it as a static site. mirror-website is designed from the beginning with the expectation that you'll need to write some code for those cases. And mirror-website makes it really easy to do that.
git init and npm init, in the usual way. Also npm install tools like lodash and cheerio that you may wish to use.npm install mirror-websiteapp.js. You'll write preprocessors (which modify the markup first), discoverers (which seek out URLs to be mirrored), and rewriters (your last chance to modify the markup). Here are some examples.Note the use of $append. Without this, you're replacing all of the standard preprocessors, discoverers and rewriters bundled with mirror-website. You probably don't want to do that.
```javascript
const _ = require('lodash');
const cheerio = require('cheerio');
const fs = require('fs');
const mirror = require('mirror-website');
mirror({
// Define the sites to be mirrored. We can override properties for
// individual sites in these objects too, the stuff in "defaults"
// applies to all of them
sites: [
{ url: "http://example1.com" },
{ url: "http://example2.com" },
{ url: "http://example3.com" },
{ url: "http://example4.com" }
],
// Config settings for all sites
defaults: {
aliases: [
'some-cdn-that-must-get-mirrored-too-for-every-site.com'
],
// Some special attributes that also carry URLs for this set of sites
urlAttrs: {
$append: [ 'altsrc', 'data-original', 'data-original-alt' ]
},
preprocessors: {
$append: [
// A chance to modify the body before any URL discoverers run
function(url, body) {
// modify body, then...
return body;
}
]
},
discoverers: {
$append: [
{
// Discover URLs in special HTML pages that just do browser side redirects.
// Push them onto the urls array to make sure they get crawled
type: 'text/html',
function: function(urls, url, body) {
const matches = body.match(/location=\'(.*?)\'/);
if (matches) {
urls.push(matches[1]);
}
}
}
]
},
rewriters: {
$append: [
{
// Work around a problem with a redirect page
type: 'text/html',
function: function(url, body) {
if (!url.match(/examples$/)) {
return body;
}
return '<script>location = \'/example/example.htm\';</script>';
}
},
{
// Replace custom contact forms found on a particular
// site with a mailto link and instructions to
// provide the same fields. This logic is specific to
// the sites I was mirroring that day. The point is
// to give you the same flexibility
type: 'text/html',
function: function(url, body) {
const $ = cheerio.load(body);
const $old = $('form.fancy');
if (!$old.length) {
return body;
}
const mailto = $old.attr('emailto');
const $link = $('<p style="margin-top: 32px"><a>Please reach out via email to ' + mailto + '.</a></p>');
$link.find('a').attr('href', 'mailto:' + mailto);
const $prompt = $('<p style="margin-top: 32px">Be sure to include the following information:</p>');
const $list = $('<ul></ul>');
const venue = $('title').text().replace(/^[\s\S]*\|\s*/, '');
if (venue) {
var $item = $('<li></li>');
$item.text('Name of venue: ' + venue);
} else {
$item.text('Name of venue');
}
$list.append($item);
$old.find('label').each(function() {
var $item = $('<li></li>');
$item.text($(this).text());
$list.append($item);
});
const $new = $('<div></div>');
$new.append($link);
$new.append($prompt);
$new.append($list);
$old.replaceWith($new);
return $.html();
}
},
]
}
},
// Custom callback to modify the config for each site
// before it actually gets mirrored
init: function(config) {
// Change foo.com into just foo, so we do not add domain names
// when creating folders, and use a "sites/" parent folder
config.folder = 'sites/' + config.folder.replace(/\.[^/]+$/, '');
}
}).then(function() {
console.log('Done!');
})
Note that the configuration object must have a sites property, which should be an array in which every entry has a url property and, optionally, aliases (an array of hostnames considered equivalent to the main one).
You can also set properties like aliases under a defaults key, which applies to every site.
Then run your application:
node app
This creates a subdirectory in the current directory named after the domain name in your URL.
2.0.0: abandoned global install in favor of a library for better maintainability and error messages.
1.0.1: request dependency for request-promise.
FAQs
Mirrors websites in a useful way, with corrected links and local copies of assets.
We found that mirror-website demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.

Product
Socket MCP now lets AI assistants review org alerts, investigate threats using the Socket threat feed, and inspect package files in addition to dependency scoring.

Product
Socket Firewall blocks malicious VS Code and Open VSX extensions before install, protecting developers from compromised editor marketplaces.