SlideShare a Scribd company logo
OBSERVABILITY: NOT JUST
AN OPS THING
@cyen
@honeycombio
OBSERVABILITY THE DEV PROCESS
OBSERVABILITY THE DEV PROCESS
The practice of understanding
the internal state of a system
via knowledge of its external
outputs.
Wikipedia (paraphrased)
OBSERVABILITY THE DEV PROCESS
Twitter hive mind
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
VERIFY IT
BUILD IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
VERIFY IT (ON MY MACHINE)
BUILD IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
…?
VERIFY IT (ON MY MACHINE)
BUILD IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
VERIFY IT (ON MY MACHINE)
VERIFY IT (IN PRODUCTION)
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
VERIFY IT (ON MY MACHINE)
VERIFY IT (IN PRODUCTION)
WATCH IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
VERIFY IT (ON MY MACHINE)
VERIFY IT (IN PRODUCTION)
WATCH IT
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
VERIFY IT (ON MY MACHINE)
VERIFY IT (IN PRODUCTION)
WATCH IT
OBSERVABILITY THE DEV PROCESS
How do my error rates look these days?
Have things gotten slower over time?
OBSERVABILITY THE DEV PROCESS
How do my error rates look these days?
Have things gotten slower over time?
What does my service look like from this one annoying
customer’s perspective?
OBSERVABILITY THE DEV PROCESS
How do my error rates look these days?
Have things gotten slower over time?
What will the impact be, of this change we’re planning?
What does my service look like from that one huge, annoying
customer’s perspective?
OBSERVABILITY THE DEV PROCESS
How do my error rates look these days?
Have things gotten slower over time?
What will the impact be, of this change we’re planning?
What does "normal" look like these days? Does it line
up with what I thought?
What does my service look like from that one huge, annoying
customer’s perspective?
OBSERVABILITY THE DEV PROCESS
The ability to answer
questions about your
system, using data
DECIDE IT
BUILD IT
VERIFY IT (ON MY MACHINE)
VERIFY IT (IN PRODUCTION)
WATCH IT
&
WHY DOES THIS MATTER SO MUCH TO ME?
▸ How’s our load? Is it spread reasonably evenly across our Kafka
partitions?
▸ Did latency increase in our API server? How does our new
batching endpoint compare to our old RESTy endpoint?
▸ How did those recent memory optimizations affect our query-
serving capacity?
▸ How’s our load? Are high-volume customers spread reasonably
evenly across our Kafka partitions?
▸ Did latency increase in our API server? Which customers were
impacted the most? And who’ll benefit the most from batching?
▸ How did those recent memory optimizations affect our query-
serving capacity for customers with string-heavy payloads?
Influx/Days 2017 San Francisco | Christine Yen
OK. SO WHAT DOES THIS LOOK LIKE?
DECIDE IT
BUILD IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
Like debug statements
in production data
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
🏁
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
🏁
Feature flags and flexible observability tools = manual canarying
… except we can do it for everything
👾
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
🏁
👾
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
🏁
👾
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
DECIDE IT
VERIFY IT
(WFM🤘)
VERIFY IT
(IN PROD)
WATCH IT
BUILD IT
BUILD ID
FEATURE FLAGS
CUSTOMER IDS
BUILD ID
FEATURE FLAGS
CUSTOMER IDS
GRAPHS
ALERTS
CODE
RELEASES
1. HYPOTHESIS
2. INSTRUMENTATION (MAYBE)
3. VALIDATION (OR NOT)
4. ONWARD
TAKING THE FIRST FEW STEPS
CONCEPTUALLY
▸ Start at the edge with basic, common attributes (e.g. HTTP)
▸ Start at the edge with basic, common attributes (e.g. HTTP)
▸ Business-relevant or infrastructure-specific characteristics (e.g.
customer ID, DB replica set)
CONCEPTUALLY
▸ Start at the edge with basic, common attributes (e.g. HTTP)
▸ Business-relevant or infrastructure-specific characteristics (e.g.
customer ID, DB replica set)
▸ Temporary additional fields for validating hypotheses
CONCEPTUALLY
▸ Start at the edge with basic, common attributes (e.g. HTTP)
▸ Business-relevant or infrastructure-specific characteristics (e.g.
customer ID, DB replica set)
▸ Temporary additional fields for validating hypotheses
▸ Prune stale fields (if necessary)
CONCEPTUALLY
▸ Contextual, structured data
SOME BEST PRACTICES
▸ Contextual, structured data
▸ Common set of nouns and consistent naming
SOME BEST PRACTICES
▸ Contextual, structured data
▸ Common set of nouns and consistent naming
▸ Don't be dogmatic; let the use case dictate the ingest pattern
SOME BEST PRACTICES
▸ Contextual, structured data
▸ Common set of nouns and consistent naming
▸ Don't be dogmatic; let the use case dictate the ingest pattern
▸ e.g. instrumenting individual reads while batching writes
SOME BEST PRACTICES
AN EXAMPLE SCHEMA EVOLUTION
first pass:
- server_hostname
- method
- url
- build_id
- remote_addr
- request_id
- status
- x_forwarded_for
- error
- event_time
- team_id
- payload_size
- sample_rate
then we added:
- dropped
- get_schema_dur_ms
- protobuf_encoding_dur_ms
- kafka_write_dur_ms
- request_dur_ms
- json_decoding_dur_ms +others
a couple of days later, we added:
- offset
- kafka_topic
- chosen_partition
after that:
- memory_inuse
- num_goroutines
a week after that:
- warning
- drop_reason
and on and on, adding 2-3 fields
every couple of weeks:
- user_agent
- unknown_columns
- dataset_partitions
- dataset_id
- dataset_name
- api_version
- create_marker_dur_ms
- marker_id
- nil_value_for_columns
- batch
- gzipped
- batch_datapoint_lens
- batch_num_datasets
- batch_process_datapoints_dur_ms
- batch_validate_datasets_dur_ms
- batch_dataset_names
- dataset_columns
- event_columns
▸ Stop writing software based on intuition, start backing it up with
data
▸ Teach observability tools to speak more than "Ops"
▸ ??? (← a.k.a., Ask lots of questions and validate hypotheses)
▸ Profit!
DEVELOPERS, OUR MISSION:
THANKS!@cyen

More Related Content

PDF
Effective codereview | Dave Liddament | CODEiD
CODEiD PHP Community
 
PPTX
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
MongoDB
 
PDF
Making software maintainable
Peter Sumskas
 
PDF
Consistent Development Environment with Vagrant and Chef
Gerald Villorente
 
PDF
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
PPTX
Understand Immutable infrastructure - at Build Stuff Kiev 2016
Quentin Adam
 
PDF
Atlassian - Software For Every Team
Sven Peters
 
Effective codereview | Dave Liddament | CODEiD
CODEiD PHP Community
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
MongoDB
 
Making software maintainable
Peter Sumskas
 
Consistent Development Environment with Vagrant and Chef
Gerald Villorente
 
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
Understand Immutable infrastructure - at Build Stuff Kiev 2016
Quentin Adam
 
Atlassian - Software For Every Team
Sven Peters
 

Similar to Influx/Days 2017 San Francisco | Christine Yen (20)

PDF
How Product Managers Thrive in a DevOps World
Atlassian
 
PPTX
IT automation: Make the server great again - toulouse devops fev 2017
Quentin Adam
 
PPTX
José Antonio Ruiz Santiago | JModern processes and workflows orchestration in...
Codemotion
 
PDF
Infrastructure Gone Wild
Isaac Christoffersen
 
PDF
flowr streamlining computing workflows
sahil seth
 
PDF
Software Architecture Anti-Patterns
Eduards Sizovs
 
PDF
Docker enables agile_devops
Boyd Hemphill
 
PDF
Orchestration vs Choreography - A Guide To Composing Your Monolith
Ian Thomas
 
PDF
Velocity NY 2016 - Devops: Who Does What?
cornelia davis
 
PDF
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
PDF
Fraud Detection with Hadoop
markgrover
 
PDF
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
PDF
Data driven devops as presented at Codemash 2018
Baruch Sadogursky
 
PDF
Modern day jvm controversies
VictorSzoltysek
 
PDF
From Monoliths to Microservices at Realestate.com.au
evanbottcher
 
PDF
Continuous Deployment: The Dirty Details
Mike Brittain
 
PDF
Devops: Who Does What? - Devops Enterprise Summit 2016
cornelia davis
 
KEY
Database Refactoring With Liquibase
IASA
 
KEY
Agile Database Development with Liquibase
Tim Berglund
 
PDF
Docker Enables DevOps
Boyd Hemphill
 
How Product Managers Thrive in a DevOps World
Atlassian
 
IT automation: Make the server great again - toulouse devops fev 2017
Quentin Adam
 
José Antonio Ruiz Santiago | JModern processes and workflows orchestration in...
Codemotion
 
Infrastructure Gone Wild
Isaac Christoffersen
 
flowr streamlining computing workflows
sahil seth
 
Software Architecture Anti-Patterns
Eduards Sizovs
 
Docker enables agile_devops
Boyd Hemphill
 
Orchestration vs Choreography - A Guide To Composing Your Monolith
Ian Thomas
 
Velocity NY 2016 - Devops: Who Does What?
cornelia davis
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Fraud Detection with Hadoop
markgrover
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
Data driven devops as presented at Codemash 2018
Baruch Sadogursky
 
Modern day jvm controversies
VictorSzoltysek
 
From Monoliths to Microservices at Realestate.com.au
evanbottcher
 
Continuous Deployment: The Dirty Details
Mike Brittain
 
Devops: Who Does What? - Devops Enterprise Summit 2016
cornelia davis
 
Database Refactoring With Liquibase
IASA
 
Agile Database Development with Liquibase
Tim Berglund
 
Docker Enables DevOps
Boyd Hemphill
 
Ad

More from InfluxData (20)

PPTX
Announcing InfluxDB Clustered
InfluxData
 
PDF
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
PDF
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
PDF
Power Your Predictive Analytics with InfluxDB
InfluxData
 
PDF
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
PDF
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
PDF
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
PDF
Introducing InfluxDB Cloud Dedicated
InfluxData
 
PDF
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
PPTX
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
PDF
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
PPTX
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
PDF
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
PDF
Understanding InfluxDB’s New Storage Engine
InfluxData
 
PDF
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
PPTX
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
PDF
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Understanding InfluxDB’s New Storage Engine
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
Ad

Recently uploaded (20)

PPTX
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
PPTX
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
dagarabull
 
PPTX
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
PPTX
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PDF
PDF document: World Game (s) Great Redesign.pdf
Steven McGee
 
PPTX
The Latest Scam Shocking the USA in 2025.pptx
onlinescamreport4
 
PPT
Transformaciones de las funciones elementales.ppt
rirosel211
 
PPTX
Parallel & Concurrent ...
yashpavasiya892
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PPTX
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
PPTX
How tech helps people in the modern era.
upadhyayaryan154
 
PDF
Project English Paja Jara Alejandro.jpdf
AlejandroAlonsoPajaJ
 
PPTX
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
PPTX
Slides Powerpoint: Eco Economic Epochs.pptx
Steven McGee
 
PDF
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
PPTX
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PPTX
nagasai stick diagrams in very large scale integratiom.pptx
manunagapaul
 
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
dagarabull
 
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PDF document: World Game (s) Great Redesign.pdf
Steven McGee
 
The Latest Scam Shocking the USA in 2025.pptx
onlinescamreport4
 
Transformaciones de las funciones elementales.ppt
rirosel211
 
Parallel & Concurrent ...
yashpavasiya892
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
How tech helps people in the modern era.
upadhyayaryan154
 
Project English Paja Jara Alejandro.jpdf
AlejandroAlonsoPajaJ
 
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
Slides Powerpoint: Eco Economic Epochs.pptx
Steven McGee
 
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
nagasai stick diagrams in very large scale integratiom.pptx
manunagapaul
 

Influx/Days 2017 San Francisco | Christine Yen

  • 1. OBSERVABILITY: NOT JUST AN OPS THING @cyen @honeycombio
  • 3. OBSERVABILITY THE DEV PROCESS The practice of understanding the internal state of a system via knowledge of its external outputs. Wikipedia (paraphrased)
  • 4. OBSERVABILITY THE DEV PROCESS Twitter hive mind
  • 5. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data
  • 6. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT
  • 7. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT
  • 8. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT VERIFY IT BUILD IT
  • 9. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT VERIFY IT (ON MY MACHINE) BUILD IT
  • 10. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT …? VERIFY IT (ON MY MACHINE) BUILD IT
  • 11. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT VERIFY IT (ON MY MACHINE) VERIFY IT (IN PRODUCTION)
  • 12. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT VERIFY IT (ON MY MACHINE) VERIFY IT (IN PRODUCTION) WATCH IT
  • 13. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT VERIFY IT (ON MY MACHINE) VERIFY IT (IN PRODUCTION) WATCH IT
  • 14. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT VERIFY IT (ON MY MACHINE) VERIFY IT (IN PRODUCTION) WATCH IT
  • 15. OBSERVABILITY THE DEV PROCESS How do my error rates look these days? Have things gotten slower over time?
  • 16. OBSERVABILITY THE DEV PROCESS How do my error rates look these days? Have things gotten slower over time? What does my service look like from this one annoying customer’s perspective?
  • 17. OBSERVABILITY THE DEV PROCESS How do my error rates look these days? Have things gotten slower over time? What will the impact be, of this change we’re planning? What does my service look like from that one huge, annoying customer’s perspective?
  • 18. OBSERVABILITY THE DEV PROCESS How do my error rates look these days? Have things gotten slower over time? What will the impact be, of this change we’re planning? What does "normal" look like these days? Does it line up with what I thought? What does my service look like from that one huge, annoying customer’s perspective?
  • 19. OBSERVABILITY THE DEV PROCESS The ability to answer questions about your system, using data DECIDE IT BUILD IT VERIFY IT (ON MY MACHINE) VERIFY IT (IN PRODUCTION) WATCH IT &
  • 20. WHY DOES THIS MATTER SO MUCH TO ME?
  • 21. ▸ How’s our load? Is it spread reasonably evenly across our Kafka partitions? ▸ Did latency increase in our API server? How does our new batching endpoint compare to our old RESTy endpoint? ▸ How did those recent memory optimizations affect our query- serving capacity?
  • 22. ▸ How’s our load? Are high-volume customers spread reasonably evenly across our Kafka partitions? ▸ Did latency increase in our API server? Which customers were impacted the most? And who’ll benefit the most from batching? ▸ How did those recent memory optimizations affect our query- serving capacity for customers with string-heavy payloads?
  • 24. OK. SO WHAT DOES THIS LOOK LIKE?
  • 25. DECIDE IT BUILD IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT
  • 26. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT
  • 27. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT
  • 28. DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT BUILD IT
  • 29. DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT BUILD IT
  • 30. DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT BUILD IT
  • 31. DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT BUILD IT Like debug statements in production data
  • 32. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT 🏁
  • 33. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT 🏁 Feature flags and flexible observability tools = manual canarying … except we can do it for everything 👾
  • 34. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT 🏁 👾
  • 35. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT 🏁 👾
  • 36. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT
  • 37. BUILD IT DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT
  • 38. DECIDE IT VERIFY IT (WFM🤘) VERIFY IT (IN PROD) WATCH IT BUILD IT
  • 40. BUILD ID FEATURE FLAGS CUSTOMER IDS GRAPHS ALERTS CODE RELEASES
  • 41. 1. HYPOTHESIS 2. INSTRUMENTATION (MAYBE) 3. VALIDATION (OR NOT) 4. ONWARD
  • 42. TAKING THE FIRST FEW STEPS
  • 43. CONCEPTUALLY ▸ Start at the edge with basic, common attributes (e.g. HTTP)
  • 44. ▸ Start at the edge with basic, common attributes (e.g. HTTP) ▸ Business-relevant or infrastructure-specific characteristics (e.g. customer ID, DB replica set) CONCEPTUALLY
  • 45. ▸ Start at the edge with basic, common attributes (e.g. HTTP) ▸ Business-relevant or infrastructure-specific characteristics (e.g. customer ID, DB replica set) ▸ Temporary additional fields for validating hypotheses CONCEPTUALLY
  • 46. ▸ Start at the edge with basic, common attributes (e.g. HTTP) ▸ Business-relevant or infrastructure-specific characteristics (e.g. customer ID, DB replica set) ▸ Temporary additional fields for validating hypotheses ▸ Prune stale fields (if necessary) CONCEPTUALLY
  • 47. ▸ Contextual, structured data SOME BEST PRACTICES
  • 48. ▸ Contextual, structured data ▸ Common set of nouns and consistent naming SOME BEST PRACTICES
  • 49. ▸ Contextual, structured data ▸ Common set of nouns and consistent naming ▸ Don't be dogmatic; let the use case dictate the ingest pattern SOME BEST PRACTICES
  • 50. ▸ Contextual, structured data ▸ Common set of nouns and consistent naming ▸ Don't be dogmatic; let the use case dictate the ingest pattern ▸ e.g. instrumenting individual reads while batching writes SOME BEST PRACTICES
  • 51. AN EXAMPLE SCHEMA EVOLUTION
  • 52. first pass: - server_hostname - method - url - build_id - remote_addr - request_id - status - x_forwarded_for - error - event_time - team_id - payload_size - sample_rate then we added: - dropped - get_schema_dur_ms - protobuf_encoding_dur_ms - kafka_write_dur_ms - request_dur_ms - json_decoding_dur_ms +others a couple of days later, we added: - offset - kafka_topic - chosen_partition after that: - memory_inuse - num_goroutines a week after that: - warning - drop_reason and on and on, adding 2-3 fields every couple of weeks: - user_agent - unknown_columns - dataset_partitions - dataset_id - dataset_name - api_version - create_marker_dur_ms - marker_id - nil_value_for_columns - batch - gzipped - batch_datapoint_lens - batch_num_datasets - batch_process_datapoints_dur_ms - batch_validate_datasets_dur_ms - batch_dataset_names - dataset_columns - event_columns
  • 53. ▸ Stop writing software based on intuition, start backing it up with data ▸ Teach observability tools to speak more than "Ops" ▸ ??? (← a.k.a., Ask lots of questions and validate hypotheses) ▸ Profit! DEVELOPERS, OUR MISSION: THANKS!@cyen