elasticsearch-watchdog
A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique.
In my situation, millions data are indexed to ElasticSearch everyday, and our cluster has too many nodes, we spent a lot of time to make it stable and reliable, but unfortunately, they crash every few months due to:
- Status changes to
red
or grey
. - Different primary nodes but not a unique one (like autocephaly).
- Unresponsive (HTTP timeout, shake failed and all that stuff).
- Other issues.
What Can Watchdog Do
- Monitor statuses/healths/states of ElasticSearch cluster/node.
- Auto restart ElasticSearch through openSSH.
- Quick look of Watchdog statuses any where, especially on mobile device.
- Make every day is Sunday.
Installation
$ npm install elasticsearch-watchdog -g
Usage
watchdog
Usage: watchdog [cmd] [file|name]
Commands:
pwd <password> encrypt the password
encrypt [options] <file> encrypt the configuration file and save it to disk
tmpl <name> render a configuration template
start [options] <file> start watching on an ElasticSearch cluster
stop <uid> stop watching by `uid`, all the watchdogs will be killed if `uid` is `all`
restart <uid> restart watching by `uid`, call the watchdogs back and then send them out for watching again if `uid` is `all`
ls [options] list all the watchdogs we have
web [port] launch a web GUI, port default by 8088
Options:
-h, --help output usage information
-v, --version output the version number
-r, --root the root location, you can find all logs here.
Basic Examples:
Start a watchdog, by file:
$ watchdog start watchdog.yml
Restart the alive watchdog, by uid:
$ watchdog restart 1001
Restart all watchdogs:
$ watchdog restart all
Stop the watchdog, by uid:
$ watchdog stop 1001
Stop all the watchdogs:
$ watchdog stop all
encrypt
Usage: encrypt [options] <file>
Options:
-h, --help output usage information
--no-blank remove the blank line if this option is provided
tmpl
$ watchdog tmpl <file>
<file>
is the name of configuration file, .yml
is optional, i.e. $ watchdog tmpl es-server
and $ watchdog tmpl es-server.yml
are both fine.
start
Usage: start [options] <file>
Options:
-h, --help output usage information
--no-daemon running watchdog as a service, otherwise in the terminal
-m, --max <number> maximize retry count when dog has died
stop
$ watchdog stop <uid>
All the watchdogs will be killed if uid is all
. Head over to Printf to get more information about uid.
restart
$ watchdog restart <uid>
All the watchdogs will be called back and then sent out for watching if name is all
. Head over to Printf to get more information about uid.
ls
Usage: ls [options]
Options:
-h, --help output usage information
--no-format print list as JSON without formatting
web
$ watchdog web [port]
$ nohup watchdog web > /dev/null 2>&1 & echo $! > /path/to/watchdog.pid
$ kill -9 `cat /path/to/watchdog.pid`
Port of web interface is optional (8088 by default). In order to have a perfect viewport, using your mobile device in a landscape mode, but not portrait.
GUI:
And a restful interface is providing yet, i.e.:http://[domain|ip]:[port]/json
.
Printf
Take an example for $ watchdog ls
, the output will be formatted like following.
-
name
CLUSTER-SERVER
and PERCOLATOR-SERVER
are names of the Watchdog.
-
uid
7707
and 6384
are uids of the Watchdogs, run $ watchdog stop 7707
or $ watchdog restart 7707
to do a stop/restart
operation.
-
colors
red
, yellow
, grey
and green
are the statuses of ElasticSearch.
-
symbols
★
means primary node, ✩
means leaves (not master nodes).
-
dim style
-
UNKNOWN [missing status]
/ 192.168.100.112 [unknown]
It means unknown primary node, and can not get the status through _cluster/health
/ _cluster/state
API.
-
192.168.100.166 [error]
It means can not connect to server through openSSH, and you'd better check the logs (~/.watchdog/logs/
).
Programmatic
var Watchdog = require('watchdog');
var monit = Watchdog({
conf: '/path/to/conf.yml',
uid: false
});
monit.on('info', function(msg){
console.log('[INFO]', msg.type, msg.message);
});
monit.watching();
Configuration
Execute $ watchdog tmpl my-es
to render a copy one, edit it to meet the individual requirements.
BTW, it almost supports all the YAML syntaxs.
In order to restart ElasticSearch smoothly, if you have ElasticSearch running then stop the process and start it using:
$ elasticsearch -d -p /path/to/es.pid [options]
Local environment
If you're running Watchdog and ElasticSearch on a same server, get the IP address by visit:
http://localhost:9200/_cluster/state
The transport_address
of current server is which you're binding to ElasticSearch, and there is no need to provide nodes.ssh.password
in configuration for it.
Examples
Head over to example
or test
directories.
Test
$ npm test
License
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.