Guide: Transcoder monitoring with Prometheus/Grafana

Hey all, here’s a guide on how to setup prometheus/grafana monitoring for your GPU transcoder.

The dashboards and the Nvidia exporter are heavily inspired/copied from the “official” Livepeer Monitoring - so many thanks to the original creator! You might want to try to run their docker solution first. I was looking for a more modular approach since I was already running Grafana/Prometheus for other stuff. Also a thank you to @yondon for answering my questions on Discord :slight_smile:

I hope this guide works for you - please comment if you have any questions/problems. Let’s get started:

  1. Prometheus:
  • Go to Download | Prometheus and get the latest version (with wget)
  • Extract with tar -xzf prometheus-...
  • Inside the prometheus folder, edit the prometheus.yml file according to this example
  • You can create a systemd or simply run prometheus in a separate terminal (e.g. using screen)
  1. Nvidia-smi exporter:
  • You need to have golang installed
  • Download nvidia_exporter.go
  • You can create a systemd or simply run this file in a separate terminal (e.g. with screen) with go run nividia_exporter.go
  1. Grafana
  • Install Grafana according to here (latest OSS): Install on Debian/Ubuntu | Grafana Labs (you might need to install the gpg-agent if not already installed)
  • Grafana already comes with a systemd service that you can start with sudo systemctl start grafana-server
  • Now you should have access to “your-IP”:3000 in your browser (might need to open the port in your server’s firewall settings). The default user and password is “admin”, change accordingly.
  • Once logged in, first add Prometheus under “configuration → data sources → add data source”. The URL is http://localhost:9090
  • Next go to “Dashboards → Manage → Import” and paste the json of the dashboards that you want: Dashboards
  1. Run your Livepeer node with the -monitor flag

8 Likes

Thank you for this :pray:

1 Like

Got mine up and running - thanks a lot @vires-in-numeris!

1 Like

Got mine up and running. Thank you!!

Hi and thanks for this.

It’s possible to monitoring if server Linux is only orchestrator and transcoder on 2 remote Windows 10 PC ?

Most likely yes but I would assume this needs some custom configuration

Ok i ll try anyway but i think it’s far beyond my knowledge :grin: but who knows…
Thanks

hei, I tried to set it all up. The nvidia exporter works but the livepeer metrics doesnt. if I check the localhost:7935/metrics I cant see anything there but if i use 127.0.0.1:7935 I can see the metrics. How do I change the settings so that prometheus sees the metrics from 127.0.0.1:7935 instead of localhost which when i open in browser resolved into jason-pc:7935 which is my pc name.

help…

You can customize the target in the prometheus.yml file. Just change the “livepeer” one from “localhost:7935” to “127.0.0.1:7935”.

I did that but I am not getting and livepeer data on prometheus. I have only estimated gas in the graphana and everything else is no data.

1 Like

Since Grafana showed that I’ve received a winning ticket but my Orchestrator didn’t automatically redeem it, I’ve decided to set up an alert in Grafana that notifies me about a winning ticket. Here’s how you do it:

  1. Set up a notification channel by going to “Alerting” → “Notification channels” on the left menu bar. Click on “New channel”

  2. The next step depends on the channel you want, here’s an instruction on how you do it for Telegram: https://gist.github.com/dl6nm/c312acbc6fddf1a56d749e045f040ca3

  3. Create/edit the “Winning Tickets” time series panel. The query should be this: sum(livepeer_winning_tickets_recv OR on() vector(0)). The sum and vector(0) is that prometheus returns 0 instead of no data when there is no winning ticket.

  4. Switch from query to the “Alert” tab. Set the conditions equal to
    WHEN diff() OF query(A, 15m, now) IS ABOVE 0, edit the “Send to” to add your notification channel that you added in the first step, adjust the message to your liking and that’s it :slight_smile:

1 Like

Amazing, thanks for this! :pray:
Would you consider adding avg GPU power draw to the Nvidia dashboard?
I have an Nvidia made P400 and should be receiving a PNY P400 V2 revision (which has lower power consumption) and it would be nice to plot the difference in Grafana. It would also help to create another plot (based on the user’s actual currency) to see how much electricity the transcoding costs.

Something like nvidia-smi --query-gpu=power.draw --format=csv?

Just wanted to also add that you can use the free hosted instance of Grafana’s own cloud instead of installing it locally. I’ve installed prometheus to listen on my LAN IP, setup port forwarding on the router and Basic Auth for prometheus and it works well.

Thank you!

To add the power, you’d have to adjust the nvidia_exporter.go script and add this to the metrics function. Shouldn’t be to hard I think, have you tried it already?

I did try, but do not know Go and gave up after a few failed attempts and asked here. Appears there’s more to it than simply adding power.draw to the list in --query-gpu=name,index,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used. Think I need to add it to the metric list too, unsure…

Would there be a way to include the Encoding and Decoding resource to the list of metrics?

I assume this would have to be done in the nvidia_exporter.go script but I’m not sure how to do that.

Any suggestions would be great :slight_smile:

1 Like

How can this be configured to support a remote transcoder configuration? Does Prometheus need to be running on the Orchestrator box specifically?

Yes, the default setting/setup requires that prometheus is run on the Orchestrator box. I’m also running grafana on the Orchestrator - those two processes don’t require that much resources and it’s the most simple setup.

1 Like

This is awesome! Thanks for recording your process :wink:

It is possible to setup remote monitoring, that’s how I do it. You just need to ensure port 7935 (or whatever port you set as the cliAddr) is open and set the listening address to zeros so it accepts requests from all IPs: -cliAddr 0.0.0.0:7935.

In the prometheus.yml config, instead of ‘localhost:7935’, set the targets to :7935.

I don’t think it’s safe to just have the cli port open to the internet so I use a firewall to only accept traffic from my monitoring node.

2 Likes

I’m trying to make sure that this is working correctly. Does it need to be done using a certain one of the dashboards or does it work with all of them? I tried to evaluate the query and I see this error.

I do not see the variable “livepeer_winning_tickets_recv” anywhere on the metrics browser.

image