github.com/codahale/lunk

Package lunk provides a set of tools for structured logging in the style of Google's Dapper or Twitter's Zipkin. When we consider a complex event in a distributed system, we're actually considering a partially-ordered tree of events from various services, libraries, and modules. Consider a user-initiated web request. Their browser sends an HTTP request to an edge server, which extracts the credentials (e.g., OAuth token) and authenticates the request by communicating with an internal authentication service, which returns a signed set of internal credentials (e.g., signed user ID). The edge web server then proxies the request to a cluster of web servers, each running a PHP application. The PHP application loads some data from several databases, places the user in a number of treatment groups for running A/B experiments, writes some data to a Dynamo-style distributed database, and returns an HTML response. The edge server receives this response and proxies it to the user's browser. In this scenario we have a number of infrastructure-specific events: This scenario also involves a number of events which have little to do with the infrastructure, but are still critical information for the business the system supports: There are a number of different teams all trying to monitor and improve aspects of this system. Operational staff need to know if a particular host or service is experiencing a latency spike or drop in throughput. Development staff need to know if their application's response times have gone down as a result of a recent deploy. Customer support staff need to know if the system is operating nominally as a whole, and for customers in particular. Product designers and managers need to know the effect of an A/B test on user behavior. But the fact that these teams will be consuming the data in different ways for different purposes does mean that they are working on different systems. In order to instrument the various components of the system, we need a common data model. We adopt Dapper's notion of a tree to mean a partially-ordered tree of events from a distributed system. A tree in Lunk is identified by its root ID, which is the unique ID of its root event. All events in a common tree share a root ID. In our photo example, we would assign a unique root ID as soon as the edge server received the request. Events inside a tree are causally ordered: each event has a unique ID, and an optional parent ID. By passing the IDs across systems, we establish causal ordering between events. In our photo example, the two database queries from the app would share the same parent ID--the ID of the event corresponding to the app handling the request which caused those queries. Each event has a schema of properties, which allow us to record specific pieces of information about each event. For HTTP requests, we can record the method, the request URI, the elapsed time to handle the request, etc. Lunk is agnostic in terms of aggregation technologies, but two use cases seem clear: real-time process monitoring and offline causational analysis. For real-time process monitoring, events can be streamed to a aggregation service like Riemann (http://riemann.io) or Storm (http://storm.incubator.apache.org), which can calculate process statistics (e.g., the 95th percentile latency for the edge server responses) in real-time. This allows for adaptive monitoring of all services, with the option of including example root IDs in the alerts (e.g., 95th percentile latency is over 300ms, mostly as a result of requests like those in tree XXXXX). For offline causational analysis, events can be written in batches to batch processing systems like Hadoop or OLAP databases like Vertica. These aggregates can be queried to answer questions traditionally reserved for A/B testing systems. "Did users who were show the new navbar view more photos?" "Did the new image optimization algorithm we enabled for 1% of views run faster? Did it produce smaller images? Did it have any effect on user engagement?" "Did any services have increased exception rates after any recent deploys?" &tc &tc By capturing the root ID of a particular web request, we can assemble a partially-ordered tree of events which were involved in the handling of that request. All events with a common root ID are in a common tree, which allows for O(M) retrieval for a tree of M events. To send a request with a root ID and a parent ID, use the Event-ID HTTP header: The header value is simply the root ID and event ID, hex-encoded and separated with a slash. If the event has a parent ID, that may be included as an optional third parameter. A server that receives a request with this header can use this to properly parent its own events. Each event has a set of named properties, the keys and values of which are strings. This allows aggregation layers to take advantage of simplifying assumptions and either store events in normalized form (with event data separate from property data) or in denormalized form (essentially pre-materializing an outer join of the normalized relations). Durations are always recorded as fractional milliseconds. Lunk currently provides two formats for log entries: text and JSON. Text-based logs encode each entry as a single line of text, using key="value" formatting for all properties. Event property keys are scoped to avoid collisions. JSON logs encode each entry as a single JSON object.

v0.0.0-20141120054618-3bb9f3be3053
Source
Go

Version published: 10 years ago

The README file is missing

FAQs

What is github.com/codahale/lunk?

Package last updated on 20 Nov 2014

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/codahale/lunk

Related posts

Node.js Adds Experimental Support for TypeScript

What’s New at Socket: Introducing Our Product Changelog