Summary
In this chapter, we covered one of the most important aspects of service reliability work – alerting. We learned how to set up the service metric collection using the Prometheus tool and the tally
library, set up service alerts using the Alertmanager tool, and connect all these components to create an end-to-end service alerting pipeline.
The material in this chapter summarized our learnings from the reliability and service telemetry topics from Chapter 11 and Chapter 12. By collecting the telemetry data and establishing the notification mechanisms using the alerting tools, we can quickly detect various service issues and get notified each time we need to mitigate them.
In the next chapter, we will cover additional aspects of Go performance monitoring, including system profiling.