SlideShare a Scribd company logo
Everything You Wanted to
Know About Distributed Tracing.
Strange Loop Conference
September 2019
Who Am I
● Hungai Amuhinda
● Twitter: @Hungai
● Nairobi, Kenya
● Work: Ajua, Infra Team
● Website: hungaikev.in
How to use this talk
● Please don’t just follow it blindly: There are often times when you will need
to do things differently.
● Software is all about trade-offs: Very few decisions are about right and
wrong.
● Try things for yourself: Why not make Friday afternoons a time to play and
experiment?
Software Evolution.
● Monolith
● On Prem
● Single Language
● Single Stack
● Virtual Machines
Software Evolution.
● Microservices
● Containers
● Multi Cloud/ Hybrid
● Polyglot
● Containers
● Serverless/ Cloud Functions
New Architectures/ New Challenges.
● Observability
● Deployment / Packaging
● Configuration Management
● Debugging
● Secrets Management
Meet: Distributed Tracing
“Distributed Tracing, also called distributed request tracing, is a method used to
profile and monitor applications, especially those built using a microservices
architecture. Distributed tracing helps pinpoint where failures occur and what
causes poor performance.”
Distributed Tracing - Terminology
Trace - a trace is a tree of spans that
follows the course of a request or
system from its source to its ultimate
destination.
Each trace is a narrative that tells the
requests story as it travels through the
system.
Distributed Tracing - Terminology
Span - are logical units of work in a
distributed system. They all have a
name, a start time, and a duration.
Each Span captures important data
points specific to the current process
handling the request.
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
trace-id = 123
parent-d = nil
span-id = 1
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
trace-id = 123
parent-d = nil
span-id = 1
Outbound
Request
trace-id = 123
parent-d = 1
span-id = 2
Distributed Tracing - Terminology
Tags & Logs: both annotate the span with some contextual information.
● Tags typically apply to the whole span, while logs represent some events that
happened during the span execution.
● A log always has a timestamp that falls within the span's start-end time interval.
● The tracing system does not explicitly track causality between logged events the
way it keeps track of causality relationships between spans, because it can be
inferred from the timestamps.
What questions can tracing help us answer?
Distributed Tracing:
● What services did a request pass through?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
● What is the critical path for a request?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
● What is the critical path for a request?
● Who should I page?
If tracing is so good why isn’t everyone using it?
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors .
● Inconsistent APIs: Tracing semantics must not be language
dependent.
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors .
● Inconsistent APIs: Tracing semantics must not be language
dependent.
● Handoff woes: Tracing libs in Project X do not handoff to
tracing libs in Project Y.
Meet OpenTelemetry
OpenTelemetry
Open Telemetry is made up of an integrated set of APIs
and libraries as well as a collection mechanism via a agent
and collector. These components are used to generate,
collect, and describe telemetry about distributed
systems.
Problems OpenTelemetry solves:
● Vendor neutrality for tracing, monitoring and
logging
● Context Propagation.
OpenTelemetry (opentelemetry.io) Is:
● Single set of APIs for tracing and metrics collection.
● Standardized Context Propagation.
● Exporters for sending data to backend of choice.
● Collector for smart traces & metrics aggregation.
● Integrations with popular web, RPC and storage
frameworks.
OpenTelemetry (opentelemetry.io) Is:
Next major version of the OpenTracing and OpenCensus projects.
+ =
OpenTelemetry Roadmap:
+
Announcement: https://medium.com/opentracing/merging-opentracing-and-opencensus-f0fe9c7ca6f0
Roadmap: https://medium.com/opentracing/a-roadmap-to-convergence-b074e5815289
Tracing with OpenTelemetry - The Options
Agentless Using an Agent
OpenTelemetry: How to get Involved
Github: https://github.com/open-telemetry
Gitter: https://gitter.im/open-telemetry
Languages:
● .NET SDK
● GoLang SDK
● Java SDK
● JavaScript SDK
● Python SDK
● Ruby SIG
● Erlang/Elixir SDK
EXAMPLE
Create a Tracer
Create a Tracer
Tracers
Instrument
Instrument
Introducing Jaeger:
● Open source distributed tracing platform.
● Inspired by Google Dapper and OpenZipkin
● Created by Uber in 2015 and donated to CNCF in 2017.
● Compliant with both OpenTracing and OpenCensus.
● Supports multiple storage options (Cassandra, ElasticSearch, In-Memory)
● Compatible with Apache Kafka for backpressure management.
DEMO
obitech/micro-obs
Conclusion
● Tracing is crucial for understanding complex, microservices applications.
● Distributed tracing provides a base view of the system that can
drastically shorten feedback loops and the number of people involved
incidents.
● Tracing provides much more context, allowing an on call responder to
better understand the system and get further on their own before
involving more people.
Thank You
Twitter: @Hungai
Email: hungaikevin@gmail.com
Everything You wanted to Know About Distributed Tracing

More Related Content

What's hot (20)

PDF
Opentelemetry - From frontend to backend
Sebastian Poxhofer
 
PDF
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS
 
PDF
Opentracing jaeger
Oracle Korea
 
PDF
Embracing Observability in CI/CD with OpenTelemetry
Cyrille Le Clerc
 
PDF
Observability
Ebru Cucen Çüçen
 
PDF
Tracing Micro Services with OpenTracing
Hemant Kumar
 
PDF
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
NETWAYS
 
PDF
Introduction to Open Telemetry as Observability Library
Tonny Adhi Sabastian
 
PDF
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
John Allspaw
 
PDF
Meetup OpenTelemetry Intro
DimitrisFinas1
 
PDF
Continuous Testing - What QA means for DevOps
SeaLights
 
PDF
Building an SRE Organization @ Squarespace
Franklin Angulo
 
PPTX
OpenTelemetry For Developers
Kevin Brockhoff
 
PDF
MuleSoft Event Driven Architecture (EDA Patterns in MuleSoft) - VirtualMuleys63
Angel Alberici
 
PDF
OpenTelemetry: From front- to backend (2022)
Sebastian Poxhofer
 
PDF
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
PPTX
Definition of done
Purbarun Chakrabarti
 
PDF
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
PDF
DevOps
ARYA TM
 
PDF
SRE 101
Diego Pacheco
 
Opentelemetry - From frontend to backend
Sebastian Poxhofer
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS
 
Opentracing jaeger
Oracle Korea
 
Embracing Observability in CI/CD with OpenTelemetry
Cyrille Le Clerc
 
Observability
Ebru Cucen Çüçen
 
Tracing Micro Services with OpenTracing
Hemant Kumar
 
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
NETWAYS
 
Introduction to Open Telemetry as Observability Library
Tonny Adhi Sabastian
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
John Allspaw
 
Meetup OpenTelemetry Intro
DimitrisFinas1
 
Continuous Testing - What QA means for DevOps
SeaLights
 
Building an SRE Organization @ Squarespace
Franklin Angulo
 
OpenTelemetry For Developers
Kevin Brockhoff
 
MuleSoft Event Driven Architecture (EDA Patterns in MuleSoft) - VirtualMuleys63
Angel Alberici
 
OpenTelemetry: From front- to backend (2022)
Sebastian Poxhofer
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
Definition of done
Purbarun Chakrabarti
 
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
DevOps
ARYA TM
 
SRE 101
Diego Pacheco
 

Similar to Everything You wanted to Know About Distributed Tracing (20)

PDF
Introduction to the open rights group censorship monitoring project
Richard King
 
PDF
Blockade.io : One Click Browser Defense
RiskIQ, Inc.
 
PPTX
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
PDF
Go Observability (in practice)
Eran Levy
 
PDF
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
PDF
Distributed tracing
Puneeth Nanjundaswamy
 
PDF
The Final Frontier, Automating Dynamic Security Testing
Matt Tesauro
 
PDF
Tenants for Going at DevSecOps Speed - LASCON 2023
Matt Tesauro
 
PPTX
DevOps State of the Union 2015
Ernest Mueller
 
PPTX
Observability for Application Developers (1)-1.pptx
OpsTree solutions
 
PPTX
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
StormForge .io
 
PDF
The working architecture of NodeJS applications, Виктор Турский
Sigma Software
 
PDF
The working architecture of node js applications open tech week javascript ...
Viktor Turskyi
 
PDF
BSIT3CD_Continuation of Cyber incident response (1).pdf
StevenJoeBiago
 
PPTX
Orchestration, Automation and Virtualisation (OAV) in GÉANT
CSUC - Consorci de Serveis Universitaris de Catalunya
 
PDF
Bringing it all together
MelissaMcKay15
 
PPTX
LF Energy Webinar: Introduction to TROLIE
DanBrown980551
 
PDF
Cynthia Wu: Satisfaction Not Guaranteed
Anna Royzman
 
PDF
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Chris Hammerschmidt
 
PPTX
Ahmadabad mule soft_meetup_11_october_2020_errorhanlingandmonitoringalerts
Shekh Muenuddeen
 
Introduction to the open rights group censorship monitoring project
Richard King
 
Blockade.io : One Click Browser Defense
RiskIQ, Inc.
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
Go Observability (in practice)
Eran Levy
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
Distributed tracing
Puneeth Nanjundaswamy
 
The Final Frontier, Automating Dynamic Security Testing
Matt Tesauro
 
Tenants for Going at DevSecOps Speed - LASCON 2023
Matt Tesauro
 
DevOps State of the Union 2015
Ernest Mueller
 
Observability for Application Developers (1)-1.pptx
OpsTree solutions
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
StormForge .io
 
The working architecture of NodeJS applications, Виктор Турский
Sigma Software
 
The working architecture of node js applications open tech week javascript ...
Viktor Turskyi
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
StevenJoeBiago
 
Orchestration, Automation and Virtualisation (OAV) in GÉANT
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Bringing it all together
MelissaMcKay15
 
LF Energy Webinar: Introduction to TROLIE
DanBrown980551
 
Cynthia Wu: Satisfaction Not Guaranteed
Anna Royzman
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Chris Hammerschmidt
 
Ahmadabad mule soft_meetup_11_october_2020_errorhanlingandmonitoringalerts
Shekh Muenuddeen
 
Ad

Recently uploaded (20)

PPTX
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Ad

Everything You wanted to Know About Distributed Tracing

  • 1. Everything You Wanted to Know About Distributed Tracing. Strange Loop Conference September 2019
  • 2. Who Am I ● Hungai Amuhinda ● Twitter: @Hungai ● Nairobi, Kenya ● Work: Ajua, Infra Team ● Website: hungaikev.in
  • 3. How to use this talk ● Please don’t just follow it blindly: There are often times when you will need to do things differently. ● Software is all about trade-offs: Very few decisions are about right and wrong. ● Try things for yourself: Why not make Friday afternoons a time to play and experiment?
  • 4. Software Evolution. ● Monolith ● On Prem ● Single Language ● Single Stack ● Virtual Machines
  • 5. Software Evolution. ● Microservices ● Containers ● Multi Cloud/ Hybrid ● Polyglot ● Containers ● Serverless/ Cloud Functions
  • 6. New Architectures/ New Challenges. ● Observability ● Deployment / Packaging ● Configuration Management ● Debugging ● Secrets Management
  • 7. Meet: Distributed Tracing “Distributed Tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.”
  • 8. Distributed Tracing - Terminology Trace - a trace is a tree of spans that follows the course of a request or system from its source to its ultimate destination. Each trace is a narrative that tells the requests story as it travels through the system.
  • 9. Distributed Tracing - Terminology Span - are logical units of work in a distributed system. They all have a name, a start time, and a duration. Each Span captures important data points specific to the current process handling the request.
  • 10. Distributed Tracing - Terminology Context Propagation: Incoming Request
  • 11. Distributed Tracing - Terminology Context Propagation: Incoming Request trace-id = 123 parent-d = nil span-id = 1
  • 12. Distributed Tracing - Terminology Context Propagation: Incoming Request trace-id = 123 parent-d = nil span-id = 1 Outbound Request trace-id = 123 parent-d = 1 span-id = 2
  • 13. Distributed Tracing - Terminology Tags & Logs: both annotate the span with some contextual information. ● Tags typically apply to the whole span, while logs represent some events that happened during the span execution. ● A log always has a timestamp that falls within the span's start-end time interval. ● The tracing system does not explicitly track causality between logged events the way it keeps track of causality relationships between spans, because it can be inferred from the timestamps.
  • 14. What questions can tracing help us answer?
  • 15. Distributed Tracing: ● What services did a request pass through?
  • 16. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request?
  • 17. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen?
  • 18. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks?
  • 19. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks? ● What is the critical path for a request?
  • 20. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks? ● What is the critical path for a request? ● Who should I page?
  • 21. If tracing is so good why isn’t everyone using it?
  • 22. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits.
  • 23. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors
  • 24. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors . ● Inconsistent APIs: Tracing semantics must not be language dependent.
  • 25. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors . ● Inconsistent APIs: Tracing semantics must not be language dependent. ● Handoff woes: Tracing libs in Project X do not handoff to tracing libs in Project Y.
  • 27. OpenTelemetry Open Telemetry is made up of an integrated set of APIs and libraries as well as a collection mechanism via a agent and collector. These components are used to generate, collect, and describe telemetry about distributed systems. Problems OpenTelemetry solves: ● Vendor neutrality for tracing, monitoring and logging ● Context Propagation.
  • 28. OpenTelemetry (opentelemetry.io) Is: ● Single set of APIs for tracing and metrics collection. ● Standardized Context Propagation. ● Exporters for sending data to backend of choice. ● Collector for smart traces & metrics aggregation. ● Integrations with popular web, RPC and storage frameworks.
  • 29. OpenTelemetry (opentelemetry.io) Is: Next major version of the OpenTracing and OpenCensus projects. + =
  • 30. OpenTelemetry Roadmap: + Announcement: https://medium.com/opentracing/merging-opentracing-and-opencensus-f0fe9c7ca6f0 Roadmap: https://medium.com/opentracing/a-roadmap-to-convergence-b074e5815289
  • 31. Tracing with OpenTelemetry - The Options Agentless Using an Agent
  • 32. OpenTelemetry: How to get Involved Github: https://github.com/open-telemetry Gitter: https://gitter.im/open-telemetry Languages: ● .NET SDK ● GoLang SDK ● Java SDK ● JavaScript SDK ● Python SDK ● Ruby SIG ● Erlang/Elixir SDK
  • 39. Introducing Jaeger: ● Open source distributed tracing platform. ● Inspired by Google Dapper and OpenZipkin ● Created by Uber in 2015 and donated to CNCF in 2017. ● Compliant with both OpenTracing and OpenCensus. ● Supports multiple storage options (Cassandra, ElasticSearch, In-Memory) ● Compatible with Apache Kafka for backpressure management.
  • 41. Conclusion ● Tracing is crucial for understanding complex, microservices applications. ● Distributed tracing provides a base view of the system that can drastically shorten feedback loops and the number of people involved incidents. ● Tracing provides much more context, allowing an on call responder to better understand the system and get further on their own before involving more people.