Distributed Tracing

Note: The information in this section is applicable for an app executable only.

Distributed tracing allows you to log information about an app's behavior during its execution. It shows the path an app takes from start to finish. You can then use the information to troubleshoot performance bottlenecks, errors, and debugging failures in the app execution.

As the app travels through different services, each segment is recorded as a span. A span is a building block of a trace and represents work done with time intervals and associated metadata. All the spans of an app are combined into a single trace to give you a picture of an entire request. A trace represents an end-to-end execution; made up of single or multiple spans. A Tracer is the actual implementation that records the spans and publishes them.

Distributed tracing is used to help you identify issues with your app (performance of the app or simply debugging an issue) instead of going through stack traces. The use of distributed tracing is particularly useful in a distributed microservice architecture environment where each app is instrumented by a tracing framework and while the tracing framework runs in the background, you can monitor each trace in the UI. You can use that to track any abnormalities or issues to identify the location of the problem.

Some Considerations

Keep the following in mind when using the distributed tracing capability in Flogo:

  • At any given point in time, only one tracer can be registered - if you try to register multiple tracers, only the first one that you register is accepted and used at run time to trace all the activities of the flow.
  • All the traces start at the flow level. There are two relations between spans - a span is either the child of a parent span or the span is a span that follows (comes after) another span. You should be able to see all the operations and the traces for the flows and activities that are part of an app. Traces of the triggers used in the app are not shown.
  • Tracing can be done across apps bypassing the tracing context from one app to another. To trace across multiple apps, you must make sure that all apps are instrumented with similar tracing frameworks, such as Jaeger semantics so that they understand the framework language. Otherwise, you can't get a holistic following of the entire trace through multiple services.
  • When looping is enabled for an Activity, each loop is considered one span, since each loop calls the server which triggers a server flow.
  • If a span is passed on to the trigger, that span becomes the parent span. You should be able to see how much time is taken between the time the event is received by the trigger and the time the trigger replies. This only works for triggers that support the extraction of the context from the underlying technology, for instance, triggers those support HTTP headers.

    The ReceiveHTTPMessage REST trigger and InvokeRESTService Activity are supported for this release where the REST trigger can extract the context from the request and InvokeRESTService Activity can inject the context into the request. If two Flogo apps are both Jaeger-enabled, when one app calls the other, you can see the chain of events (invocation and how much time is taken by each invocation) in the Jaeger UI. If app A is calling app B, the total request time taken by app A is the cumulative of the time taken by all activities in app A plus the time taken by the service that it calls. If you open up each invocation separately, you can see the details of how much time was taken by each Activity in that invocation.

  • Triggers that support span (for instance the REST trigger) are always the parent, so any flows that are attached to that trigger are always the children of the trigger span. Trigger span is completed only after the request goes to the flow and the flow returns.
  • A subflow becomes a child of the Activity from which it is called.