edu-platform-observability
The intention of the edu-platform-observability library, is to provide a "paved road" for all edu platform services who wish to capture telemetry data. This data includes logs, traces, and metrics. We are striving for the following:
- Minimize friction in setting up observability for a new service.
- Encourage standardization of a basic set of telemetry to report.
- Encourage standardization of telemetry metadata (ex: log attributes).
- Minimize effort required to make strategic changes w.r.t how we sample or format our data, and what downstream tools we use.
This library makes use of OpenTelemetry and Winston
Installation
npm install @chanzuckerberg/edu-platform-observability
Note that we currently are not published to npm. It is TBD if we should publish this to our public NPM repo.
Usage
import {init} from 'platform-observability';
const telemetryConfig = {
serviceName: 'my-service',
};
const telemetry = init(telemetryConfig);
app.use(telemetry.createMiddleware());
const getLoadContext = (
req: Request,
res: Response,
): AppLoadContext => {
return telemetry.createExpressRemixContext(req, res);
};
app.all('*', createRequestHandler({
build: telemetry.instrumentRemixBuild(require(BUILD_DIR)),
getLoadContext,
}));
Doing this alone will ensure basic functionality.
- All requests are logged (request received, request sent, and request error).
- Traces include spans for all outgoing HTTP requests, as well as remix
action
and loader
functions. - Traces are sampled (100% sampling when running locally, and 10% when running on happy infra).
- All requests are metered with a histogram.
- On happy infra: logs, traces, and metrics will be properly captured and made available.
- Suport for metrics and traces in local development environment (see below).
All express handlers will have telemetry tools accessible through the res.locals object:
const {
logger,
tracer,
meter
} = res.locals as TelemetryContext;
All remix loader and action functions will have telemetry tools available through context
:
export async function loader({request, context, params}: LoaderArgs) {
const {tracer, meter, logger} = context as TelemetryContext;
}
Local Development
In order to use local telemetry tools, you must first set enableCollection
to true. It will default to false
when running locally.
const telemetryConfig = {
enableCollection: true,
};
Then, a local telemetry stack can be spun up with the following commands (executed at the root of your project)
npx -p @chanzuckerberg/edu-platform-observability telemetry-up
To shut down the stack:
npx -p @chanzuckerberg/edu-platform-observability telemetry-down
In your browser you can view traces and metrics using
Zipkin: http://localhost:9411
Prometheus: http://localhost:9090
Alternatively, you can enable console telemetry like this:
const telemetryConfig = {
enableConsoleTracingAndMetrics: true,
};
Time Measurement
Use the TimeMeasurement
class to do time measurement for service-specific metrics. Example:
import {TimeMeasurement} from '@chanzuckerberg/edu-platform-observability'
const measurement = new TimeMeasurement();
const elapsedTime = measurement.getElapsedMs();
histogram.record(elapsedTime, histogramAttributes);
Configuration
The following configuration options are available in TelemetryConfig. Some have defaults, and some have alternative environment variables that can be used if the value is not provided in TelemetryConfig.
Option | Meaning | Environment Variable | Default Value |
---|
isDev | Indicates that the service is running in a local dev env. | | !process.env.DEPLOYMENT_STAGE |
enableConsoleTracingAndMetrics | If true, and isDev is true, metrics and tracing are outputted to the console. Very noisy! | ENABLE_CONSOLE_TRACING_AND_METRICS | false |
serviceName | The name of the service, to be used in telemetry metadata. | | No default value |
serviceVersion | The version of the service | | When isDev is false: TBD (auto-detect) When isDev is true: dev |
collectorHost | The hostname of the open telemetry collector | OTEL_COLLECTOR_HOST | When isDev is false: scraper-collector.opentelemetry-operator-system.svc.cluster.local When isDev is true: localhost |
logLevel | The minimum log level to output for logging | LOG_LEVEL | When isDev is false: info When isDev is true: debug |
enableCollection | When true, collectorHost is used in order to publish metrics and traces. | ENABLE_OTEL_COLLECTION | When isDev is false: true When isDev is true: false |
ignoreOutgoingRequestHook | A function used to ignore certain outgoing requests for tracing. Signature is: (req: RequestOptions) => boolean | | No default implementation |
enableGraphQLTracing | When enabled, GraphQL istrumentation is enabled for tracing. | | false |