Distributed Tracing

MSX Distributed Tracing allows the collection of an operational flow graph. Based on OpenTracing, tracing helps pinpoint where failures occur and what causes poor performance.

Model

  • Span

    A span is a named, timed operation representing a piece of the operational flow. Spans can have parents and children.

  • Trace

    A trace is the complete tree of spans from an entire operational flow. A new trace (with a new root span) is created by input from an external system, such as a REST API client. Traces extend across synchronous and asynchronous message flows (interal RPC and events).

Usage

The most common usage of tracing is to create a new child span within the current span, and execute an operation inside it. To facilitate this, you can use the trace.Operation() function:

err := trace.Operation(ctx, "myChildOperation", func(ctx context.Context) error {
myLogger.WithContext(ctx).Info("Inside myChildOperation...")
return nil
})

To create a new child span and attach data to it, you can use the trace.NewSpan() function:

// Create the new span
ctx, span := trace.NewSpan(ctx, spanName)
defer span.Finish()

// Tag the operation name
span.SetTag(trace.FieldOperation, operationName)

// Execute the operation and record the result
if err := myOperation(); err != nil {
span.LogFields(trace.Status("ERROR"), trace.Error(err))
} else {
span.LogFields(trace.Status("OK"))
}

Common trace log tags include:

  • trace.FieldOperation: Generic operation name
  • trace.FieldStatus: Terminal status of the operation
  • trace.FieldHttpCode: Response status code
  • trace.FieldHttpUrl: Request url
  • trace.FieldHttpMethod: Request method

Other tags can be defined as needed using simple period-separated strings (e.g. grpc.response.code).

Advanced Usage

When writing a new driver for external input (such as a new RPC transport listener), you can retrieve the untraced context:

ctx = trace.UntracedContextFromContext(ctx)

This context object should be passed to the input handlers, who will be responsible for starting a new (root) span:

err := trace.Operation(ctx, "myInputReceiver", myInputHandler)

Configuration

By default, MSX tracing will send trace data to a Jaeger listener at udp://localhost:6831.

The following configuration settings can be specified to override the default behaviour:

KeyDescriptionDefault
trace.service-namename of service to supply with the trace${info.app.name}
trace.service-versionversion of service to supply with the trace${info.app.name}
trace.collectorwhich collector to use, jaeger,datadogjaeger
trace.reporter.enabledreport distributed tracing datafalse
trace.reporter.hostjaeger/datadog hostlocalhost
trace.reporter.portjaeger/datadog port6831
trace.reporter.urlzipkin urlhttp://localhost:9411/api/v1/spans

Datadog

To configure for datadog, set the following values in consul:

trace.collector: datadog
trace.reporter.enabled: true
trace.reporter.port: 8126

and in the kubernetes manifest:

env:
- name: TRACE_REPORTER_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

That will send traces to the collector on the same host.