SlideShare a Scribd company logo
Observability, Distributed Tracing,
and Open Source
The Missing Primer
2
https://laprensasa.com/culture/art-music/mozart-festival-texas-returns-uiw/
3
4
5
• DanielKhan
daniel.khan@dynatrace.com
@dkhan
• Dir. TechnologyStrategy @Dynatrace
• Everything Open Source Monitoring &
standards& our contributionsto it
• Chairof W3C Trace Context
About me
6
Why I am doing this talk
Distributed
Tracing
Observability
W3C Trace
Context
OpenCensus
OpenTracing
OpenTelemetry
Metrics
Span
Trace
7
Application
In the Beginning there was the Monolith
Presentation
Business Logic
Data Access
Database
Services
Presentation
API Gateway
Auth Inventory CartAccount
Offers Shipping CheckoutStatus
Wire
8
Developmentin a Microservices World
Cart
Dev
Preproduction
Cart Auth InventoryAccount
Offers Shopping CheckoutStatus
Push
Cart
• Latency
• Response Time
• Error Rate
• Number of queries
KPI’s
9
Metrics
Source: https://techblog.commercetools.com/adding-consistency-and-automation-to-grafana-e99eb374fe40
… containtime correlated datapoints
• Counter
Monotonously increasing values
Think: Odometer
• Gauge
Increasing and decreasing values
Think: Tachometer
• Histogram
Groups values into buckets
Think: Knock events 0-50mph, 51-100mph, …
10
Collecting and Charting Metrics
11
Error
242
Success
1302
Cart Service
12
Complecity has movedto the NetworkLayer
Client API GW Service
Service
Service
Service
Service
ServiceCart
Which requests lead to an error in our cart service?
Trace
a42b a42b
a42b
a42b
a42b
a42b
a42b
a42b
a42b = Trace Context
13
A Trace is a Tree of Spans
Trace
Span
Span
Span
Click GW API
Spans represent a single operationand containmetadatalike the HTTP method, or a databasequery, or an error code
JDBC
Span: callDB()
Span: JDBC call
14
Trace Context Propagation
Cartsa42b a42b
Extract
Inject
In process propagation
Auto Instrumentation
• Zipkin
• Sleuth
• OpenTelemetry
• Commercial
• …
15
Trace Context Header Formats
• Proprietary
• B3 Header (Zipkin)
• W3C Trace Context
What’sthe header name and what does it contain?
16
W3C Trace Context
Service A API GW Service B
Trace
Service C
OpenTelemetry AWS Zipkin OpenTelemetry
Goal: All monitoring systems and middlewaresagree on one format for trace context propagation
Span
Span
Span
Span
17
W3C Trace Context Format
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01 tracestate:
rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
Version TraceID ParentID Flags
18
Data collection
So far we just instrumentedthe code to propagatecontext but no data has been collected
Trace
Span
Span
Span
Agent Agent Agent
Click GW API
TraceContext TraceContext
MonitoringSystem Storage
19
Data Collection & PresentationSystems
Solution Agents Instrumentation Storage Presentation
Zipkin / Sleuth + + + +
Jaeger - - + +
OpenTelemetry + + - -
Commercial + + + +
20
Zipkin
21
Jaeger
22
Commercial
23
Entity Model Based Service Flow
24
Detecting Errors
25
Solving our Cart Problem
Client API GW Service
Service
Service
Service
Service
ServiceCart
Trace
Client
Service
Currency
Cart
API GW
GET: Currency=EURO
26
What we did to Solve the Problem
1. We used metrics to learn about a problem
2. We used distributedtracing to pass along a unique ID per trace
• For that,we used auto instrumentationto extract and inject the trace ID
3. We used a monitoring system and its agentsto collected traces and we could filter transactionsthatproduced an
error
4. We looked into the metadata of such a transactionto identify how it differs from succeeding ones
27
You’ve mentionedOpenTelemetry …
+ =
In early 2019 OpenCensusand OpenTracingmerged into OpenTelemetry
Metrics, Traces, Logs
28
APIs SDKs Exporters Collector
29
30
31
32
33
OpenTelemetry – Developer usecases
• Cloud nativemicroservices architectures are hard
to trace and debug during development
• In developmentOpenTelemetry can be used to
either
• manuallycreate spans to trace certain
execution paths
• use provided auto-instrumentation tomake
a system observable
• As backend and UI, Jaeger is the most popular
tool. It’s open source and solely displaystraces
34
OpenTelemetry – in Production
• Provides just a fraction of what modern tools provide
• Traces
• Metrics
• Logs
• Topology
• Behavior
• Code level visibility
• Metadata
• Manual instrumentation codeneeds to be kept up-to-date
• A backend needs to be maintained
• No support model if instrumentation breaksproductioncode
• No enterprise features (access control, throttling, scaling, …)
35
Why do Vendors Care then?
36
OpenTelemetry Company Contribution Stats
Google
Microsoft
Dynatrace
37
38
What happens when we add support for a new framework?
• Today, our engineers reverse engineer frameworks to add
instrumentationsupport to them
• Every time an update is released, the instrumentationcode is
being tested.
• In case of issues, it goes back to the developmentteam who
needs to fix it and deploy an update.
• The whole process is automated and transparent to the customer ☺
• This is costly and time consuming
39
In-process tracing
Click GW API
MonitoringSystem
Trace
Span
Span
Span
HZQ
Span: doHZQ()()
Span: HZQ call
OTEL HZQ
Wrapper
40
“We want every platform and library to
be pre-instrumented with
OpenTelemetry and we’re committed to
making this as easy as possible.”
Sergey Kanzhelev (Google)
41
What is Observability and how does it differ fromMonitoring?
1. In control theory, observability is a measure of how well internal states of a system can be inferred from
knowledgeof its external outputs.
Source: Wikipedia
2. In software development, observabilityisachieved by adding code (instrumentation)that emits telemetry
data.
3. Monitoringis the act of displayingand analyzing this telemetry data.
4. Monitoringalone can tell you that there is a problem.
E.g. ”We see that some users experience a 50% higher response time on check-out”
5. Observablityhelps finding the root cause (the why) by providingdatathat can be correlatedand analysed
freely even if this problem is completelynew to you (unknown unknowns)
E.g. “The response time of the checkout increases exponentially with the number of items in the basket,
because of a misplaced for loop that executes the same database query times the number of items for every
item in the basket”
42
Putting it all Together
• Metrics can help you to learn that there is a problem
• Distributedtracing becomes increasingly importantto understandmulti-tier execution paths and root causes
of problems
• Developersnow rely on metrics and traces to understandhow their service functionsin their microservice
architectures
• Pure Open Source solutionsare viable for pre-prod environments
• Standardization isthe only way to tackle today’scomplexity and Open Source is the key driver
• Vendorsare prepared to tap intodata collected by Open Source standard toolsto add enterprise features on
top to support web-scale workloads
43
dynatrace.com
@dkhan
daniel.khan@dynatrace.com
Thank you!

More Related Content

What's hot (20)

PDF
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci
 
PDF
Cloud-Native Observability
Tyler Treat
 
PPTX
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
Splunk
 
PPTX
Grafana.pptx
Bhushan Rane
 
PPTX
Observability vs APM vs Monitoring Comparison
jeetendra mandal
 
PPTX
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
PPTX
Grafana
NoelMc Grath
 
PDF
Application Performance Monitoring with OpenTelemetry
Jan Mikeš
 
PDF
Observability in Java: Getting Started with OpenTelemetry
DevOps.com
 
PDF
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
DevOps.com
 
PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
PPT
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
PDF
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
Observability
Diego Pacheco
 
PDF
Server monitoring using grafana and prometheus
Celine George
 
PDF
Observability & Datadog
JamesAnderson599331
 
PDF
Observability
Ebru Cucen Çüçen
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Exploring the power of OpenTelemetry on Kubernetes
Red Hat Developers
 
PPTX
Observability, what, why and how
Neeraj Bagga
 
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci
 
Cloud-Native Observability
Tyler Treat
 
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
Splunk
 
Grafana.pptx
Bhushan Rane
 
Observability vs APM vs Monitoring Comparison
jeetendra mandal
 
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Grafana
NoelMc Grath
 
Application Performance Monitoring with OpenTelemetry
Jan Mikeš
 
Observability in Java: Getting Started with OpenTelemetry
DevOps.com
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
DevOps.com
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Observability
Diego Pacheco
 
Server monitoring using grafana and prometheus
Celine George
 
Observability & Datadog
JamesAnderson599331
 
Observability
Ebru Cucen Çüçen
 
Kafka Streams: What it is, and how to use it?
confluent
 
Exploring the power of OpenTelemetry on Kubernetes
Red Hat Developers
 
Observability, what, why and how
Neeraj Bagga
 

Similar to Observability, Distributed Tracing, and Open Source: The Missing Primer (20)

PPTX
ThroughTheLookingGlass_EffectiveObservability.pptx
Grace Jansen
 
PDF
Opentracing jaeger
Oracle Korea
 
PDF
Distributed Tracing with Jaeger
Inho Kang
 
PDF
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
SonjaChevre
 
PDF
Vulnerability Detection Based on Git History
Kenta Yamamoto
 
PDF
OSS Japan - Application Monitoring And Tracing In Kubernetes
David vonThenen
 
PDF
Go Observability (in practice)
Eran Levy
 
PPTX
ADDO Open Source Observability Tools
Mickey Boxell
 
PDF
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
PDF
初探 OpenTelemetry - 蒐集遙測數據的新標準
Marcus Tung
 
PPTX
OpenTelemetry 101 FTW
NGINX, Inc.
 
PDF
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Matthew Skelton
 
PPTX
DockerCon SF 2019 - Observability Workshop
Kevin Crawley
 
PDF
SCaLE 16x - Application Monitoring And Tracing In Kubernetes
David vonThenen
 
PDF
Extra micrometer practices with Quarkus | DevNation Tech Talk
Red Hat Developers
 
PDF
Dances with bits - industrial data analytics made easy!
Julian Feinauer
 
PPTX
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
PDF
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
Matthew Skelton
 
PPTX
Keep Calm and Distributed Tracing
Angelo Simone Scotto
 
PDF
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS
 
ThroughTheLookingGlass_EffectiveObservability.pptx
Grace Jansen
 
Opentracing jaeger
Oracle Korea
 
Distributed Tracing with Jaeger
Inho Kang
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
SonjaChevre
 
Vulnerability Detection Based on Git History
Kenta Yamamoto
 
OSS Japan - Application Monitoring And Tracing In Kubernetes
David vonThenen
 
Go Observability (in practice)
Eran Levy
 
ADDO Open Source Observability Tools
Mickey Boxell
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
初探 OpenTelemetry - 蒐集遙測數據的新標準
Marcus Tung
 
OpenTelemetry 101 FTW
NGINX, Inc.
 
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Matthew Skelton
 
DockerCon SF 2019 - Observability Workshop
Kevin Crawley
 
SCaLE 16x - Application Monitoring And Tracing In Kubernetes
David vonThenen
 
Extra micrometer practices with Quarkus | DevNation Tech Talk
Red Hat Developers
 
Dances with bits - industrial data analytics made easy!
Julian Feinauer
 
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
Matthew Skelton
 
Keep Calm and Distributed Tracing
Angelo Simone Scotto
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS
 
Ad

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
PDF
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
PPTX
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
PDF
Spring Update | July 2023
VMware Tanzu
 
PPTX
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
PPTX
Building Cloud Ready Apps
VMware Tanzu
 
PDF
Spring Boot 3 And Beyond
VMware Tanzu
 
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
PPTX
tanzu_developer_connect.pptx
VMware Tanzu
 
PDF
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
PDF
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
PDF
Virtual Developer Connect Workshop - English
VMware Tanzu
 
PDF
Tanzu Developer Connect - French
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
PDF
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
Spring Update | July 2023
VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
Building Cloud Ready Apps
VMware Tanzu
 
Spring Boot 3 And Beyond
VMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
tanzu_developer_connect.pptx
VMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
Virtual Developer Connect Workshop - English
VMware Tanzu
 
Tanzu Developer Connect - French
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Ad

Recently uploaded (20)

PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Executive Business Intelligence Dashboards
vandeslie24
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Import Data Form Excel to Tally Services
Tally xperts
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 

Observability, Distributed Tracing, and Open Source: The Missing Primer

  • 1. Observability, Distributed Tracing, and Open Source The Missing Primer
  • 3. 3
  • 4. 4
  • 5. 5 • DanielKhan [email protected] @dkhan • Dir. TechnologyStrategy @Dynatrace • Everything Open Source Monitoring & standards& our contributionsto it • Chairof W3C Trace Context About me
  • 6. 6 Why I am doing this talk Distributed Tracing Observability W3C Trace Context OpenCensus OpenTracing OpenTelemetry Metrics Span Trace
  • 7. 7 Application In the Beginning there was the Monolith Presentation Business Logic Data Access Database Services Presentation API Gateway Auth Inventory CartAccount Offers Shipping CheckoutStatus Wire
  • 8. 8 Developmentin a Microservices World Cart Dev Preproduction Cart Auth InventoryAccount Offers Shopping CheckoutStatus Push Cart • Latency • Response Time • Error Rate • Number of queries KPI’s
  • 9. 9 Metrics Source: https://techblog.commercetools.com/adding-consistency-and-automation-to-grafana-e99eb374fe40 … containtime correlated datapoints • Counter Monotonously increasing values Think: Odometer • Gauge Increasing and decreasing values Think: Tachometer • Histogram Groups values into buckets Think: Knock events 0-50mph, 51-100mph, …
  • 12. 12 Complecity has movedto the NetworkLayer Client API GW Service Service Service Service Service ServiceCart Which requests lead to an error in our cart service? Trace a42b a42b a42b a42b a42b a42b a42b a42b a42b = Trace Context
  • 13. 13 A Trace is a Tree of Spans Trace Span Span Span Click GW API Spans represent a single operationand containmetadatalike the HTTP method, or a databasequery, or an error code JDBC Span: callDB() Span: JDBC call
  • 14. 14 Trace Context Propagation Cartsa42b a42b Extract Inject In process propagation Auto Instrumentation • Zipkin • Sleuth • OpenTelemetry • Commercial • …
  • 15. 15 Trace Context Header Formats • Proprietary • B3 Header (Zipkin) • W3C Trace Context What’sthe header name and what does it contain?
  • 16. 16 W3C Trace Context Service A API GW Service B Trace Service C OpenTelemetry AWS Zipkin OpenTelemetry Goal: All monitoring systems and middlewaresagree on one format for trace context propagation Span Span Span Span
  • 17. 17 W3C Trace Context Format traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01 tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE Version TraceID ParentID Flags
  • 18. 18 Data collection So far we just instrumentedthe code to propagatecontext but no data has been collected Trace Span Span Span Agent Agent Agent Click GW API TraceContext TraceContext MonitoringSystem Storage
  • 19. 19 Data Collection & PresentationSystems Solution Agents Instrumentation Storage Presentation Zipkin / Sleuth + + + + Jaeger - - + + OpenTelemetry + + - - Commercial + + + +
  • 23. 23 Entity Model Based Service Flow
  • 25. 25 Solving our Cart Problem Client API GW Service Service Service Service Service ServiceCart Trace Client Service Currency Cart API GW GET: Currency=EURO
  • 26. 26 What we did to Solve the Problem 1. We used metrics to learn about a problem 2. We used distributedtracing to pass along a unique ID per trace • For that,we used auto instrumentationto extract and inject the trace ID 3. We used a monitoring system and its agentsto collected traces and we could filter transactionsthatproduced an error 4. We looked into the metadata of such a transactionto identify how it differs from succeeding ones
  • 27. 27 You’ve mentionedOpenTelemetry … + = In early 2019 OpenCensusand OpenTracingmerged into OpenTelemetry Metrics, Traces, Logs
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33 OpenTelemetry – Developer usecases • Cloud nativemicroservices architectures are hard to trace and debug during development • In developmentOpenTelemetry can be used to either • manuallycreate spans to trace certain execution paths • use provided auto-instrumentation tomake a system observable • As backend and UI, Jaeger is the most popular tool. It’s open source and solely displaystraces
  • 34. 34 OpenTelemetry – in Production • Provides just a fraction of what modern tools provide • Traces • Metrics • Logs • Topology • Behavior • Code level visibility • Metadata • Manual instrumentation codeneeds to be kept up-to-date • A backend needs to be maintained • No support model if instrumentation breaksproductioncode • No enterprise features (access control, throttling, scaling, …)
  • 35. 35 Why do Vendors Care then?
  • 36. 36 OpenTelemetry Company Contribution Stats Google Microsoft Dynatrace
  • 37. 37
  • 38. 38 What happens when we add support for a new framework? • Today, our engineers reverse engineer frameworks to add instrumentationsupport to them • Every time an update is released, the instrumentationcode is being tested. • In case of issues, it goes back to the developmentteam who needs to fix it and deploy an update. • The whole process is automated and transparent to the customer ☺ • This is costly and time consuming
  • 39. 39 In-process tracing Click GW API MonitoringSystem Trace Span Span Span HZQ Span: doHZQ()() Span: HZQ call OTEL HZQ Wrapper
  • 40. 40 “We want every platform and library to be pre-instrumented with OpenTelemetry and we’re committed to making this as easy as possible.” Sergey Kanzhelev (Google)
  • 41. 41 What is Observability and how does it differ fromMonitoring? 1. In control theory, observability is a measure of how well internal states of a system can be inferred from knowledgeof its external outputs. Source: Wikipedia 2. In software development, observabilityisachieved by adding code (instrumentation)that emits telemetry data. 3. Monitoringis the act of displayingand analyzing this telemetry data. 4. Monitoringalone can tell you that there is a problem. E.g. ”We see that some users experience a 50% higher response time on check-out” 5. Observablityhelps finding the root cause (the why) by providingdatathat can be correlatedand analysed freely even if this problem is completelynew to you (unknown unknowns) E.g. “The response time of the checkout increases exponentially with the number of items in the basket, because of a misplaced for loop that executes the same database query times the number of items for every item in the basket”
  • 42. 42 Putting it all Together • Metrics can help you to learn that there is a problem • Distributedtracing becomes increasingly importantto understandmulti-tier execution paths and root causes of problems • Developersnow rely on metrics and traces to understandhow their service functionsin their microservice architectures • Pure Open Source solutionsare viable for pre-prod environments • Standardization isthe only way to tackle today’scomplexity and Open Source is the key driver • Vendorsare prepared to tap intodata collected by Open Source standard toolsto add enterprise features on top to support web-scale workloads