Guide: Production Log Monitoring and Alerting

speedybird · February 10, 2023, 6:30pm

Hey everyone, I finished up the documentation and code to implement a full monitoring stack, including proactive alerts. The resulting guide (links below) will enable you to easily set up the same system for your own use.

Please feel free to provide critique/feedback here or via Github issues. Mike and I will be holding an office hours to demonstrate the installation process on Wednesday, February 15 @ 10am Eastern. Join us to see it live.

Many thanks to @MikeZupper and @Authority_Null for supporting this effort.

Who is this guide for?

This guide is for Orchestrators who are looking to centrally manage Orchestrator operations and receive proactive notifications to avoid costly downtime.

A few benefits:

A central management point for all Livepeer metrics and logs
A publicly accessible and password protected Grafana dashboard
Proactive Telegram alerts when your Orchestrator is unhealthy (e.g. Blockwatch failures, ticket redemption issues, etc)
The ability to create new alerts and dashboards to monitor your system. There are many community dashboards out there that you can add on.

Sample Alerts and Dashboard

Components Used:

This project uses the Promtail, Loki, Grafana, Prometheus, Alertmanager, Traefik, and Docker Compose.

Project Code and Install:

The project can be found on Github here.

The install guide is also on Github here.

Authority_Null · February 10, 2023, 9:06pm

Excellent work @speedybird & @MikeZupper !

Topic		Replies	Views
Guide: Transcoder monitoring with Prometheus/Grafana Transcoders	26	5783	February 27, 2022
Guide: Livepeer with Docker and Portainer for operating at scale Transcoders	6	2148	August 12, 2022
Monitoring livepeer orchestrator with netdata Transcoders	0	270	February 25, 2023
Some thoughts on Orchestrator's collating metrics	12	997	December 18, 2021
Transcoder Campaign: Open Orchestrator Transcoders	5	614	February 7, 2023