lb
A small package for smart load balancing of simple, redirect-based
(i.e. not directly reverse proxied) services.
It allows nodes to report their presence and utilization, and can
perform multi-dimensional, hierarchical-region-aware (or really any
structured, hierarchical tagging structure you want) load balancing of
incoming requests.
The load balancing algorithm is completely trivial and contains no
damping factors, so it has the potential for all sorts of pathological
misbehavior - let's say it works best under the assumption of
statistically homogeneous traffic distribution.
The implementation is a package so you can build your own presence /
redirection protocol on top of it, extending the provided GRPC
interface and types. It is divided into a "server-side" component, and
an "agent" that is supposed to run on the nodes providing the service
and send back utilization metrics to the server.
Each node can report utilization along multiple dimensions (e.g. "cpu"
and "memory"), as metrics of either counter or gauge type. For
counters, the load balancer itself will compute rates. This keeps the
implementation of the agent simple as it doesn't have to worry about
timing or counter resets.
Limits on the utilization are then used to compute the fullness of
each node, which is used by the load balancer to pick an available
node for incoming requests to the service. The load balancer computes
the maximum utilization of a node across all known dimensions, then
picks the node with the lowest utilization. It also maintains an
estimate of the query cost, i.e. the variation in utilization
associated with every new request sent to a specific node. This
estimate is used to constantly update the expected node utilization,
to mitigate "thundering herd" effects when we receive a large number
of incoming requests before the server receives the updated
utilization metrics from the node.
Requests can have tags associated with them, in which case the load
balancer will try to find a non-full node that matches the provided
tags. When this is impossible, it will repeat the operation after
dropping the least-specific tag (tags are meant to be provided in
hierarchical order). If a node is found this way, the response will
have the overflow bit set to true. If the node that is ultimately
found has a fullness greater than 1, it will still be returned, but
the response will have the overload bit set. The caller can then
make an informed decision on whether to drop the incoming request or
not.