Overview
Kademlia.js is a Javascript implementation of the distributed hash table Kademlia, originally designed in 2002 by Petar Maymounkov and David Mazières. A distributed hash table (DHT) is a key-value data store which can operate distributed across multiple nodes (or computers) on a network. The Kademlia DHT is a peer-to-peer network, and completley decentralized. A Kademlia network in which anyone can participate is a public network, a Kademlia network in which only certain people can participate is a private network.
Nodes on the network share network-wide constants , and . Each node also has an of length bits. defines the amount of nodes a Kademlia instance can keep in each bucket in it's routing table, and the number of nodes each data should be replicated across when setting it to the network. defines the number of simultaneous queries a node performs in the lookup stage. defines the length in bits of node ids and data keys. As requiring keys to be exactly bits long is inconvenient, data keys are hashed with a hashing algorithm with digest size bits. The original paper and many implementations of Kademlia use SHA-1 as the hashing algorithm, and a value of 160, however by default this implementation uses the SHA-3-256 with a value of 256, to increase the key space and address security concerns with the SHA-1 hashing algorithm. The hash function must be the same across all nodes in the network.
Kademlia can efficiently fetch and set data to and from the network, with set and get operations scaling with , where is the number of nodes connected to the network. Kademlia uses a recursive algorithm, with maximum concurrent queries, to traverse the network to find the nodes with the smallest distance between the key of the data and each node's id (where the distance between two IDs is defined as the XOR of two IDs), and then uses those nodes to either get data from or set data onto the network. After setting data to the network, by default we also store data onto the closest node that we queried that didn't return a value (caching).
Every piece of data stored on each node by default has an expire time in milliseconds of , where the function returns the smallest of the parameters passed, is by default 24 hours, and is some function that returns a value expontentially inversely proportional to the number of nodes in the storers routing table closer to the key of the data to store than the storer (). The number returned by the function will be rounded. By default the function is:
which can alternatively be written:
This behaviour can be disabled, and the time for keys expiring can just be the value of , however in this case if caching is enabled, this may cause over-caching.
Security Considerations
Kademlia is a great solution for storing data on decentralized networks, however users do have some security considerations to take into account. First off, the integrity of the data being retrieved from a Kademlia network in which adversial nodes could potentially participate (a public network) is not guaranteed. Therefore the integrity of all important data entered into the network should be authenticatable - possibly with a cryptographic signature.
By default Kademlia nodes communicate unencrypted over UDP. However in private Kademlia networks it may be desirable to encrypt communications between nodes. In this implementation it is possible to encrypt communications between nodes by setting encrypted
as true, and passing custom encrypt
and decrypt
functions to encrypt and decrypt data.
Kademlia lookups are also vulnerable to manipulation by adversaries. If an adversary is encountered during a lookup, they can manipulate the lookup, and likely compromise the lookup so either the wrong data is returned or no data is set to the network. Eclipse attacks or sybil attacks could also be attempted by adversaries to manipulate network operations.
Kademlia nodes also must be bootstrapped with a non-adversarial node, otherwise every node on the network could easily be controlled by an adversary.
These problems can largely be offset by using authenticatable data, bootstrapping to a non-adversarial node, and S/Kademlia.
Installation
You can install kademlia.js through NPM, with the command:
$ npm i kademlia.js
Example
Example: Create a Kademlia node, set data on the network, and then fetch it again.
const Kademlia = require("kademlia.js");
let node = new Kademlia(5533);
await node.bootstrap({
ip: otherNodeIp,
port: otherNodePort,
});
await node.set("test-key", "test-data");
let fetchedData = node.get("test-key");
assert(fetchedData === "test-data");
Documentation
You can import the library after installing it like so:
const Kademlia = require("kademlia.js");
Creating a new Kademlia node, providing the port for the Kademlia node, with an optional dictionary containing options for the node:
let node = new Kademlia(5533, {});
To bootstrap a node onto the network, providing the details of a node:
await node.bootstrap({
ip: otherNodeIp,
port: otherNodePort,
});
You can then use the node you created to set data to the network:
await node.set("key", "value");
You can also retrieve data from the network for a specific key. If that key isn't on the network, null
is returned.
await node.get("key");
If you wanted to use a different hash function, e.g. reverting to the original paper's SHA-1, it could be done when creating a node like so. Note SHA-1 has a digest size of 160 bits, so B is also set as 160, which is okay as it is a multiple of 8.
let node = new Kademlia(5533, {
hashFunction: "sha-1",
B: 160,
});
The data transmitted over UDP by Kademlia must be in string form, however it may be useful to be able to store more complex objects. Therefore we have a serializeData
and deserializeData
. serializeData
is called whenever a Kademlia node needs to transmit data, and is passed the data that is to be transmitted, and must return the serialized form. deserializeData
is called whenever data is received by Kademlia, is passed the received data and must return the deserialized data. By default serializeData
is JSON.stringify
and deserializeData
is JSON.parse
. These functions can be changed like so:
let node = new Kademlia(5533, {
serializeData: (dataToSerialize) => {
...
return serializedData;
}
deserializeData: (dataToDeserialize) => {
...
return deserializedData;
}
});
The way Kademlia stores values locally can also be changed. By default when Kademlia receives a store request, the data sent to be stored is stored under the specified key in the database, replacing any data already under that key. If a different functionality is required, such as adding the new value to a list, that can be done with the storeFunction
. When Kademlia receives a store request the storeFunction
is passed the new data to be stored under the key and the data currently stored in the database under the key, and must return the value that should be stored in the database.
The following example maintains a list under the key, and appends new values to the list:
let node = new Kademlia(5533, {
storeFunction: function (newData, oldData) {
if (oldData === undefined) {
if (Array.isArray(newData)) {
return newData;
} else {
return [newData];
}
} else {
if (Array.isArray(newData)) {
return newData;
} else {
oldData.push(newData);
return oldData;
}
}
},
})
Lookups are generally pretty fast, however if the timeout value is large and many offline nodes are encountered during the lookup, this may slow down the lookups greatly. To decrease the effect of offline nodes on the lookup speed, the value of timeout
can be lowered, however if timeout
is two low, queries may expire before online nodes get a chance to respond to them.
let node = new Kademlia(5533, {
timeout: 1000,
});
It may be desirable for data to be flushed out of the system if not being republished by a user. To achieve this you should disable node's republishing values they're storing and caching. In this scenario it may also be useful to disable ttl scaling.
let node = new Kademlia(5533, {
cache: false,
republish: false,
scalettl: false,
});
On private networks you may want to encrypt communications between Kademlia nodes. You can achieve this like this so:
let node = new Kademlia(5533, {
encrypted: true,
encrypt: (data) => encryptionFunction(data),
decrypt: (data) => decryptionFunction(data),
});
Credit
Implementation Author: Tom.
Original Kademlia Designers: Petar Maymounkov and David Mazières.
License
MIT