mwn
Quick links: Getting Started — GitHub — NPM — API Documentation
Mwn is a modern and comprehensive MediaWiki bot framework for Node.js, originally adapted from mwbot.
Mwn works with both JavaScript and TypeScript. It is created with a design philosophy of allowing bot developers to easily and quickly write bot code, without having to deal with the MediaWiki API complications and idiosyncrasies such as logins, tokens, maxlag, query continuations and error handling. Making raw API calls is also supported for complete flexibility where needed. The axios library is used for HTTP requests.
Mwn uses JSON with formatversion 2 by default. Use of the legacy formatversion is not recommended. Note that Special:ApiSandbox uses formatversion=1 by default, so if you're testing API calls using ApiSandbox be sure to set the correct formatversion there, otherwise the output will be formatted differently.
Versioning: while mwn is in version 0, changes may be made to the public interface with a change in the minor version number.
Complete API documentation is available here (alternative link). In addition to the MediaWiki Action API, the library also provides methods to talk to the Wikimedia EventStreams API, the ORES API and WikiWho API.
Amongst the major highlights are batchOperation
and seriesBatchOperation
which allow you run a large number of tasks with control over concurrency and sleep time between tasks. Failing actions are automatically retried.
This library uses mocha for tests and has extensive test coverage covering all commonly used code paths. Testing is automated using a CI workflow on Github Actions.
Setup
To install, run npm install mwn
.
Or obtain the latest development copy:
git clone https://github.com/siddharthvp/mwn.git
cd mwn
npm install
npm run build
Node.js compatibility
Mwn is written in TypeScript v4. The repository contains JavaScript files compiled to CommonJS module system for ES2018 target, which corresponds to Node 10.x.
If your bot is hosted on Toolforge, note that the system version of node there is v8.11.1. You can install a more recent version of node to your home directory, using:
npm install npm@latest
npm install n
export N_PREFIX=~
./node_modules/n/bin/n lts
export PATH=~/bin:$PATH
Check that your .profile
or .bashrc
file includes the line PATH="$HOME/bin:$PATH"
, so that the path includes your home directory every time you open the shell.
If you're using mwn for a Toolforge webservice, use the Kubernetes backend which provides node v10. Mwn is not supported for the legacy Grid Engine backend since it uses node v8.11.1. The toolforge-node-app-base template repository can quickly get you started with a basic web tool boilerplate.
MediaWiki compatibility
Mwn is written for and tested on the latest version of MediaWiki used on WMF wikis. Support for MW versions going back to 1.34 is planned.
Set up a bot password or OAuth credentials
Mwn supports authentication via both BotPasswords and OAuth 1.0a. Use of OAuth is recommended as it does away the need for separate API requests for logging in, and is also more secure.
Bot passwords, however, are a bit easier to set up. To generate one, go to the wiki's Special:BotPasswords page.
Features
Handling multiple users and wikis: Mwn can seamlessly work with multiple bot users signed into the same wiki, and multiple wikis at the same time. You just have to create multiple bot instances – each one representing a wiki + user. Each bot instance uses an isolated cookie jar; all settings are also isolated.
Maxlag: The default maxlag parameter used by mwn is 5 seconds. Requests failing due to maxlag will be automatically retried after pausing for a duration specified by maxlagPause
(default 5 seconds). A maximum of maxRetries
will take place (default 3).
Token handling: Tokens are automatically fetched as part of mwn.init()
or bot.login()
or bot.getTokensAndSiteInfo()
. Once retrieved, they are stored in the bot state and can be reused any number of times. If any API request fails due to an expired or missing token, the request is automatically retried after fetching a new token. bot.getTokens()
can be used to refresh the token cache, though mwn manages this, so you'd never need to explicitly use that.
Retries: Mwn automatically retries failing requests bot.options.maxRetries
times (default: 3). This is useful in case of connectivity resets and the like. As for errors raised by the API itself, note that MediaWiki generally handles these at the response level rather than the protocol level (they still emit a 200 OK response). Mwn will attempt retries for these errors based on the error code. For instance, if the error is readonly
or maxlag
, retry is done after a delay. If it's assertuserfailed
or assertbotfailed
(indicates a session loss), mwn will try to log in again and then retry. If it's badtoken
, retry is done after fetching a fresh edit token.
Handling query continuation: Mwn uses asynchronous generators, (for await...of loops) to provide a very intuitive interface around MediaWiki API's query continuation.
Use bot.continuedQueryGen
everytime you want to fetch more results than what the API gives you in one go (usually 5000 results). continuedQueryGen
automatically uses the continue parameters in the response to create and send new requests that retrieve data from where the previous response was cut off.
The following example generates a list of all active users on the wiki (which may be more than 5000).
var activeusers = [];
for await (let json of bot.continuedQueryGen({
"action": "query",
"list": "allusers",
"auactiveusers": 1,
"aulimit": "max"
})) {
let users = json.query.allusers.map(user => user.name);
activeusers = activeusers.concat(users);
}
Specialised derivatives exist to fulfill common needs:
new bot.page('Page name').historyGen()
- fetch page historynew bot.page('Page name').logsGen()
- fetch page logsnew bot.category('Page name').membersGen()
- fetch category membersnew bot.user('User name').contribsGen()
- fetch user contributionsnew bot.user('User name').logsGen()
- fetch user logs
Every method names that end in Gen
is an async generator.
Emergency shutoff: Mwn exploits Node's asynchronous event loop to efficiently implement emergency shutoff.
bot.enableEmergencyShutoff({
page: 'User:ExampleBot/shutoff',
intervalDuration: 5000,
condition: function(pagetext) {
if (pagetext !== 'running') {
return false;
} else return true;
},
onShutoff: function (pagetext) {
process.exit();
}
})
The rate of shutoff checks is not impacted by your actual editing rate, as it takes place separately in a setInterval() loop. Caution: this implementation has not been stress-tested.
Shutoff once enabled can be disabled anytime, such as when you stop performing write operations and you're now just doing read operations.
bot.disableEmergencyShutoff();
Bot exclusion compliance: Mwn's edit() method can be configured to respect {{nobots}} or equivalent. If the text of the page tests positive for the exclusionRegex you set in the bot options, edit will be aborted.
bot.setOptions({
exclusionRegex: /\{\{nobots\}\}/i
})
It's also possible to do this on a per-edit basis:
bot.edit('Page name', (rev) => {
return rev.content + 'lorem ipsum';
}, {
exclusionRegex: /\{\{nobots\}\}/i
})
Exclusion compliance is not enabled by default.
Getting started
Importing mwn:
In JavaScript:
const {mwn} = require('mwn');
Note: Prior to mwn v0.8.0, import was via const mwn = require('mwn');
In TypeScript:
import {mwn} from 'mwn';
If you're migrating from mwbot, note that:
edit
in mwbot is different from edit
in mwn. You want to use save
instead.- If you were using the default formatversion=1 output format, set formatversion: 1 in the config options.
Create a new bot instance:
const bot = await mwn.init({
apiUrl: 'https://en.wikipedia.org/w/api.php',
username: 'YourBotUsername',
password: 'YourBotPassword',
OAuthCredentials: {
consumerToken: "16_DIGIT_ALPHANUMERIC_KEY",
consumerSecret: "20_DIGIT_ALPHANUMERIC_KEY",
accessToken: "16_DIGIT_ALPHANUMERIC_KEY",
accessSecret: "20_DIGIT_ALPHANUMERIC_KEY"
},
userAgent: 'myCoolToolName 1.0 ([[link to bot user page or tool documentation]])',
defaultParams: {
assert: 'user'
}
});
This creates a bot instance, signs in and fetches tokens needed for editing.
You can also create a bot instance synchronously (without using await):
const bot = new mwn({
...options
});
This creates a bot instance which is not signed in. Then to authenticate, use bot.login()
which returns a promise. If using OAuth, use bot.initOAuth()
followed by bot.getTokensAndSiteInfo()
. Note that bot.initOAuth()
does not involve an API call. Any error in authentication will surface on running bot.getTokensAndSiteInfo().
The bot options can also be set using setOptions
rather than through the constructor:
bot.setOptions({
silent: false,
retryPause: 5000,
maxRetries: 3,
});
Direct API calls
The request
method is for directly querying the API. See mw:API for options. You can create and test your queries in Special:ApiSandbox. Be sure to set formatversion: 2 in the options for format=json!
Example: get all images used on the article Foo
bot.request({
"action": "query",
"prop": "images",
"titles": "Foo"
}).then(data => {
return data.query.pages[0].images.map(im => im.title);
});
Mwn provides a great number of convenience methods so that you can avoid writing raw API calls, see the sections below.
Editing pages
Edit a page. On edit conflicts, a retry is automatically attempted once.
bot.edit('Page title', rev => {
var text = rev.content.replace(/foo/g, 'bar');
return {
text: text,
summary: 'replacing foo with bar',
minor: true
};
});
Some more functions associated with editing pages:
await bot.save('Page title', 'Page content', 'Edit summary');
await bot.create('Page title', 'Page content', 'Edit summary');
await bot.newSection('Page title', 'New section header', 'Section content', additionalOptions);
Other basic operations
Read the contents of a page:
await bot.read('Page title');
Read a page along with metadata:
await bot.read('Page title', {
rvprop: ['content', 'timestamp', 'user', 'comment']
});
Read multiple pages using a single API call:
bot.read(['Page 1', 'Page 2', 'Page 3']).then(pages => {
});
Delete a page:
await bot.delete('Page title', 'deletion log summary', additionalOptions);
Move a page along with its subpages:
await bot.move('Old page title', 'New page title', 'move summary', {
movesubpages: true,
movetalk: true
});
Parse wikitext (see API:Parse for additionalOptions)
await bot.parseWikitext('Input wikitext', additonalOptions);
Parse the contents of a given page
await bot.parseTitle('Page name', additionalOptions);
Upload a file from your system to the wiki:
await bot.upload('File title', '/path/to/file', 'comment', customParams);
Download a file from the wiki:
await bot.download('File:File name.jpg', 'Downloaded file name.jpg');
Creating a page object opens up further possibilities:
let page = new bot.page('Main Page');
See list of methods available on page object.
Files and categories have their own subclasses that add a few additional methods.
Working with titles
Titles can be represented as objects created using the class constructor on mwn
, as: (bot
is the mwn object)
let title = new bot.title('Wikipedia:Articles for deletion');
title = bot.title.newFromText('Wikipedia:Articles for deletion');
title = new bot.title('Aritcles for deletion', 4);
title.getMainText();
title.getNamespaceId();
title = bot.title.newFromText('cateEogrY:living people');
title.toText();
The API of this class is based on that of mw.Title in the on-site JS interface. See full list of methods.
Working with wikitext
Mwn can be used for parsing wikitext:
let wkt = new bot.wikitext('This is some wikitext with [[links]] and {{templates|with=params}}.');
wkt.parseTemplates();
wkt.parseLinks();
wkt.links;
In addition:
bot.wikitext.parseTable(wikitext)
parses simple tables without fancy markup; will throw on unparsable input
Bulk processing methods
continuedQuery / continuedQueryGen: Handles query continuation. See "Handling query continuations" in Features section above.
continuedQuery returns a promised resolved with the array of all individual API response.
Use of continuedQueryGen
is recommended since continuedQuery will fetch the results of all the API calls before it begins to do anything with the results. continuedQueryGen
gets the result of each API call and processes them one at a time.
massQuery / massQueryGen: MediaWiki sets a limit of 500 (50 for non-bots) on the number of pages that can be queried in a single API call. To query more than that, massQuery
or massQueryGen
can be used. This splits the page list into batches of 500 and sends individual queries and returns a promise resolved with the array of all individual API call responses.
Example: get the protection status of a large number of pages:
bot.massQuery({
"action": "query",
"format": "json",
"prop": "info",
"titles": ['Page1', 'Page2', 'Page1300'],
"inprop": "protection"
})
.then((jsons) => {
});
Any errors in the individual API calls will not cause the entire massQuery to fail, but the data at the array index corresponding to that API call will be error object.
massQueryGen is the generator equivalent that yields each API response as when they're received.
Batch operations
Perform asynchronous tasks (involving API usage) over a number of pages (or other arbitrary items). batchOperation
uses a default concurrency of 5. Customise this according to how expensive the API operation is. Higher concurrency limits could lead to more frequent API errors.
batchOperation(pageList, workerFunction, concurrency, maxRetries)
: The workerFunction
must return a promise.
bot.batchOperation(pageList, (page, idx) => {
}, 5, 2);
bot.seriesBatchOperation(pageList, workerFunction, sleepDuration, retries)
can be used for serial operations, with a sleep duration between each task (default 5 seconds).
bot.seriesBatchOperation(pageList, (page, idx) => {
}, 5000, 2);
Note that seriesBatchOperation
with delay=0 is same as batchOperation
with concurrency=1.
Licensing
Mwn is released under GNU Lesser General Public License (LGPL) v3.0, since it borrows quite a bit of code from MediaWiki core (GPL v2). LGPL is a more permissive variant of the more popular GNU GPL. Unlike GPL, LPGL allows the work to be used as a library in software not released under GPL-compatible licenses, and even in proprietary software. However, any derivatives of this library should also be released under LGPL or another GPL-compatible license.