Google Cloud Pubsub Exporter
Status | |
---|
Stability | beta |
Supported pipeline types | traces, logs, metrics |
Distributions | contrib |
⚠️ This is a community-provided module. It has been developed and extensively tested at Collibra, but it is not officially supported by GCP.
This exporter sends OTLP messages to a Google Cloud Pubsub topic.
The following configuration options are supported:
project
(Optional): The Google Cloud Project of the topics.topic
(Required): The topic name to receive OTLP data over. The topic name should be a fully qualified resource
name (eg: projects/otel-project/topics/otlp
).compression
(Optional): Set the payload compression, only gzip
is supported. Default is no compression.watermark
Behaviour of how the ce-time
attribute is set (see watermark section for more info)
behavior
(Optional): current
sets the ce-time
attribute to the system clock, earliest
sets the attribute to
the smallest timestamp of all the messages.allow_drift
(Optional): The maximum difference the ce-time
attribute can be set from the system clock. When the
drift is set to 0, the maximum drift from the clock is allowed (only applicable to earliest
).
exporters:
googlecloudpubsub:
project: my-project
topic: otlp-traces
Pubsub topic
The Google Cloud Pubsub export doesn't automatic create topics, it expects the topic
to be created upfront. Security wise it's best to give the collector its own service account and give the
topic Pub/Sub Publisher
permission.
Messages
The message published on the topic are CloudEvent compliance and uses the binary content mode
defined in the
Google Cloud Pub/Sub Protocol Binding for CloudEvents
.
The data field is either a ExportTraceServiceRequest
, ExportMetricsServiceRequest
or ExportLogsServiceRequest
for
traces, metrics or logs respectively. Each message is accompanied by the following attributes:
attributes | description |
---|
ce-specversion | Follow version 1.0 of the CloudEvent spec |
ce-source | The source is this /opentelemetry/collector/googlecloudpubsub/<version> exporter |
ce-id | a random UUID to uniquely define the message |
ce-time | a watermark indicating when the events, encapsulated in the OTLP message, where generated. The behavior will depend on the watermark setting in the configuration |
ce-type | depending on the data org.opentelemetry.otlp.traces.v1 , org.opentelemetry.otlp.metrics.v1 or org.opentelemetry.otlp.logs.v1 |
content-type | the content type is application/protobuf |
content-encoding | indicates that payload is compressed. Only gzip compression is supported |
Compression
By default, the messages are not compressed. By compressing the messages, the cost of Pubsub can be reduced to
up to 20% of the cost. This can be done by setting the compression
to gzip
.
exporters:
googlecloudpubsub:
project: my-project
topic: otlp-traces
compression: gzip
The exporter with add the content-encoding
attribute to the message. The receiver will look at this attribute
to detect the compression that is used on the payload.
Only gzip
is supported.
Watermark
A watermark is a threshold that indicates where streaming processing frameworks (like Apache Beam) expects all the
data in a window to have arrived. If new data arrives with a timestamp that's in the window but older than the
watermark, the data is considered late data. The watermark section will change the behaviour of the ce-time
attribute of the message. If you don't use such frameworks you can ignore the section and the ce-time
will
be set to the current time, but to have a more reliable watermark behaviour in such streaming it's better to set
the ce-time
attribute to the earliest timestamp of the messages embedded in the Pubsub message.
Setting the behaviour to earliest
will scan all the embedded message before sending the actual Pubsub message to
figure out what the earliest timestamp is. You have to set allow_drift
, the allowed maximum for the ce-time
timestamp , if you want to behaviour to have effect as the default is 0s
.
exporters:
googlecloudpubsub:
project: my-project
topic: otlp-traces
watermark:
behavior: earliest
allow_drift: 1h
The default behavior is that the watermark is set to the current time of the processor. This timestamp will not differ
that much as the timestamp that is attached to a Pubsub message. Most users that don't do anything outside using Pubsub
as a global distribution system will not need anything else.
If you use Google Cloud Dataflow and want to rely on the advanced streaming
feature you may want to change the behavior of the watermark and de-duplication. You can leverage the unique id (ce-id
)
and a timestamp (ce-time
) attributes on the message. In Apache Beam (the framework used by Dataflow) you can set the
attributes names on the Pubsub connector
via the .withTimestampAttribute("ce-time")
and .withIdAttribute("ce-id")
methods. A good settings for this
scenario is behavior: earliest
with a reasonable allow_drift
of 1h
.
Allowed behavior values are current
or earliest
. For allow_drift
the default is 0s
, so make sure to set the
value.