Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor

Package Overview

Dependencies

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor

v0.114.0
Source
Go

Version published: 5 days ago

Created: 3 years ago

Source

Probabilistic Sampling Processor

Status
Stability	alpha: logs
	beta: traces
Distributions	core, contrib, k8s
Issues
Code Owners	@jpkrohling, @jmacd

The probabilistic sampler processor supports several modes of sampling for spans and log records. Sampling is performed on a per-request basis, considering individual items statelessly. For whole trace sampling, see tailsamplingprocessor.

For trace spans, this sampler supports probabilistic sampling based on a configured sampling percentage applied to the TraceID. In addition, the sampler recognizes a sampling.priority annotation, which can force the sampler to apply 0% or 100% sampling.

For log records, this sampler can be configured to use the embedded TraceID and follow the same logic as applied to spans. When the TraceID is not defined, the sampler can be configured to apply hashing to a selected log record attribute. This sampler also supports sampling priority.

Consistency guarantee

A consistent probability sampler is a Sampler that supports independent sampling decisions for each span or log record in a group (e.g. by TraceID), while maximizing the potential for completeness as follows.

Consistent probability sampling requires that for any span in a given trace, if a Sampler with lesser sampling probability selects the span for sampling, then the span would also be selected by a Sampler configured with greater sampling probability.

Completeness property

A trace is complete when all of its members are sampled. A "sub-trace" is complete when all of its descendents are sampled.

Ordinarily, Trace and Logging SDKs configure parent-based samplers which decide to sample based on the Context, because it leads to completeness.

When non-root spans or logs make independent sampling decisions instead of using the parent-based approach (e.g., using the TraceIDRatioBased sampler for a non-root span), incompleteness may result, and when spans and log records are independently sampled in a processor, as by this component, the same potential for completeness arises. The consistency guarantee helps minimimize this issue.

Consistent probability samplers can be safely used with a mixture of probabilities and preserve sub-trace completeness, provided that child spans and log records are sampled with probability greater than or equal to the parent context.

Using 1%, 10% and 50% probabilities for example, in a consistent probability scheme the 50% sampler must sample when the 10% sampler does, and the 10% sampler must sample when the 1% sampler does. A three-tier system could be configured with 1% sampling in the first tier, 10% sampling in the second tier, and 50% sampling in the bottom tier. In this configuration, 1% of traces will be complete, 10% of traces will be sub-trace complete at the second tier, and 50% of traces will be sub-trace complete at the third tier thanks to the consistency property.

These guidelines should be considered when deploying multiple collectors with different sampling probabilities in a system. For example, a collector serving frontend servers can be configured with smaller sampling probability than a collector serving backend servers, without breaking sub-trace completeness.

Sampling randomness

To achieve consistency, sampling randomness is taken from a deterministic aspect of the input data. For traces pipelines, the source of randomness is always the TraceID. For logs pipelines, the source of randomness can be the TraceID or another log record attribute, if configured.

For log records, the attribute_source and from_attribute fields determine the source of randomness used for log records. When attribute_source is set to traceID, the TraceID will be used. When attribute_source is set to record or the TraceID field is absent, the value of from_attribute is taken as the source of randomness (if configured).

Sampling priority

The sampling priority mechanism is an override, which takes precedence over the probabilistic decision in all modes.

🛑 Compatibility note: Logs and Traces have different behavior.

In traces pipelines, when the priority attribute has value 0, the configured probability will by modified to 0% and the item will not pass the sampler. When the priority attribute is non-zero the configured probability will be set to 100%. The sampling priority attribute is not configurable, and is called sampling.priority.

In logs pipelines, when the priority attribute has value 0, the configured probability will by modified to 0%, and the item will not pass the sampler. Otherwise, the logs sampling priority attribute is interpreted as a percentage, with values >= 100 equal to 100% sampling. The logs sampling priority attribute is configured via sampling_priority.

Mode Selection

There are three sampling modes available. All modes are consistent.

Hash seed

The hash seed method uses the FNV hash function applied to either a Trace ID (spans, log records), or to the value of a specified attribute (only logs). The hashed value, presumed to be random, is compared against a threshold value that corresponds with the sampling percentage.

This mode requires configuring the hash_seed field. This mode is enabled when the hash_seed field is not zero, or when log records are sampled with attribute_source is set to record.

In order for hashing to be consistent, all collectors for a given tier (e.g. behind the same load balancer) must have the same hash_seed. It is also possible to leverage a different hash_seed at different collector tiers to support additional sampling requirements.

This mode uses 14 bits of information in its sampling decision; the default sampling_precision, which is 4 hexadecimal digits, exactly encodes this information.

This mode is selected by default.

Hash seed: Use-cases

The hash seed mode is most useful in logs sampling, because it can be applied to units of telemetry other than TraceID. For example, a deployment consisting of 100 pods can be sampled according to the service.instance.id resource attribute. In this case, 10% sampling implies collecting log records from an expected value of 10 pods.

Proportional

OpenTelemetry specifies a consistent sampling mechanism using 56 bits of randomness, which may be obtained from the Trace ID according to the W3C Trace Context Level 2 specification. Randomness can also be explicly encoding in the OpenTelemetry tracestate field, where it is known as the R-value.

This mode is named because it reduces the number of items transmitted proportionally, according to the sampling probability. In this mode, items are selected for sampling without considering how much they were already sampled by preceding samplers.

This mode uses 56 bits of information in its calculations. The default sampling_precision (4) will cause thresholds to be rounded in some cases when they contain more than 16 significant bits.

Proportional: Use-cases

The proportional mode is generally applicable in trace sampling, because it is based on OpenTelemetry and W3C specifications. This mode is selected by default, because it enforces a predictable (probabilistic) ratio between incoming items and outgoing items of telemetry. No matter how SDKs and other sources of telemetry have been configured with respect to sampling, a collector configured with 25% proportional sampling will output (an expected value of) 1 item for every 4 items input.

Equalizing

This mode uses the same randomness mechanism as the propotional sampling mode, in this case considering how much each item was already sampled by preceding samplers. This mode can be used to lower sampling probability to a minimum value across a whole pipeline, making it possible to conditionally adjust sampling probabilities.

This mode compares a 56 bit threshold against the configured sampling probability and updates when the threshold is larger. The default sampling_precision (4) will cause updated thresholds to be rounded in some cases when they contain more than 16 significant bits.

Equalizing: Use-cases

The equalizing mode is useful in collector deployments where client SDKs have mixed sampling configuration and the user wants to apply a uniform sampling probability across the system. For example, a user's system consists of mostly components developed in-house, but also some third-party software. Seeking to lower the overall cost of tracing, the configures 10% sampling in the samplers for all of their in-house components. This leaves third-party software components unsampled, making the savings less than desired. In this case, the user could configure a 10% equalizing probabilistic sampler. Already-sampled items of telemetry from the in-house components will pass-through one for one in this scenario, while items of telemetry from third-party software will be sampled by the intended amount.

Sampling threshold information

In all modes, information about the effective sampling probability is added into the item of telemetry. The random variable that was used may also be recorded, in case it was not derived from the TraceID using a standard algorithm.

For traces, threshold and optional randomness information are encoded in the W3C Trace Context tracestate fields. The tracestate is divided into sections according to a two-character vendor code; OpenTelemetry uses "ot" as its section designator. Within the OpenTelemetry section, the sampling threshold is encoded using "th" and the optional random variable is encoded using "rv".

For example, 25% sampling is encoded in a tracing Span as:

tracestate: ot=th:c

Users can randomness values in this way, independently, making it possible to apply consistent sampling across traces for example. If the Trace was initialized with pre-determined randomness value 9b8233f7e3a151 and 100% sampling, it would read:

tracestate: ot=th:0;rv:9b8233f7e3a151

This component, using either proportional or equalizing modes, could apply 50% sampling the Span. This span with randomness value 9b8233f7e3a151 is consistently sampled at 50% because the threshold, when zero padded (i.e., 80000000000000), is less than the randomess value. The resulting span will have the following tracestate:

tracestate: ot=th:8;rv:9b8233f7e3a151

For log records, threshold and randomness information are encoded in the log record itself, using attributes. For example, 25% sampling with an explicit randomness value is encoded as:

sampling.threshold: c
sampling.randomness: e05a99c8df8d32

Sampling precision

When encoding sampling probability in the form of a threshold, variable precision is permitted making it possible for the user to restrict sampling probabilities to rounded numbers of fixed width.

Because the threshold is encoded using hexadecimal digits, each digit contributes 4 bits of information. One digit of sampling precision can express exact sampling probabilities 1/16, 2/16, ... through 16/16. Two digits of sampling precision can express exact sampling probabilities 1/256, 2/256, ... through 256/256. With N digits of sampling precision, there are exactly (2^N)-1 exactly representable probabilities.

Depending on the mode, there are different maximum reasonable settings for this parameter.

The hash_seed mode uses a 14-bit hash function, therefore precision 4 completely captures the available information.
The equalizing mode configures a sampling probability after parsing a float32 value, which contains 20 bits of precision, therefore precision 5 completely captures the available information.
The proportional mode configures its ratio using a float32 value, however it carries out the arithmetic using 56-bits of precision. In this mode, increasing precision has the effect of preserving precision applied by preceding samplers.

In cases where larger precision is configured than is actually available, the added precision has no effect because trailing zeros are eliminated by the encoding.

Error handling

This processor considers it an error when the arriving data has no randomness. This includes conditions where the TraceID field is invalid (16 zero bytes) and where the log record attribute source has zero bytes of information.

By default, when there are errors determining sampling-related information from an item of telemetry, the data will be refused. This behavior can be changed by setting the fail_closed property to false, in which case erroneous data will pass through the processor.

Configuration

The following configuration options can be modified:

mode (string, optional): One of "proportional", "equalizing", or "hash_seed"; the default is "proportional" unless either hash_seed is configured or attribute_source is set to record.
sampling_percentage (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items.
hash_seed (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
fail_closed (boolean, optional, default = true): Whether to reject items with sampling-related errors.
sampling_precision (integer, optional, default = 4): Determines the number of hexadecimal digits used to encode the sampling threshold. Permitted values are 1..14.

Logs-specific configuration

attribute_source (string, optional, default = "traceID"): defines where to look for the attribute in from_attribute. The allowed values are traceID or record.
from_attribute (string, optional, default = ""): The name of a log record attribute used for sampling purposes, such as a unique log record ID. The value of the attribute is only used if the trace ID is absent or if attribute_source is set to record.
sampling_priority (string, optional, default = ""): The name of a log record attribute used to set a different sampling priority from the sampling_percentage setting. 0 means to never sample the log record, and >= 100 means to always sample the log record.

Examples:

Sample 15% of log records according to trace ID using the OpenTelemetry specification.

processors:
  probabilistic_sampler:
    sampling_percentage: 15

Sample logs according to their logID attribute:

processors:
  probabilistic_sampler:
    sampling_percentage: 15
    attribute_source: record # possible values: one of record or traceID
    from_attribute: logID # value is required if the source is not traceID

Give sampling priority to log records according to the attribute named priority:

processors:
  probabilistic_sampler:
    sampling_percentage: 15
    sampling_priority: priority

Detailed examples

Refer to config.yaml for detailed examples on using the processor.

FAQs

What is github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor?

Package last updated on 18 Nov 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor

Probabilistic Sampling Processor

Consistency guarantee

Completeness property

Sampling randomness

Sampling priority

Mode Selection

Hash seed

Hash seed: Use-cases

Proportional

Proportional: Use-cases

Equalizing

Equalizing: Use-cases

Sampling threshold information

Sampling precision

Error handling

Configuration

Logs-specific configuration

Detailed examples

Related posts

Input Validation Vulnerabilities Dominate MITRE's 2024 CWE Top 25 List

Risky Business Podcast: Why Open Source Software Needs Better Malware Tracking