hookshot/docs/metrics.md
Will Hunt d12a05d0fa
Add feeds failing metric (#681)
* Track failing feeds in a metric

* Update docs

* changelog
2023-03-24 13:12:50 +00:00

82 lines
4.9 KiB
Markdown

Prometheus Metrics
==================
You can configure metrics support by adding the following to your config:
```yaml
metrics:
enabled: true
bindAddress: 127.0.0.1
port: 9002
```
An example dashboard that can be used with [Grafana](https://grafana.com) can be found at [/contrib/hookshot-dashboard.json](https://github.com/matrix-org/matrix-hookshot/blob/main/contrib/hookshot-dashboard.json).
There are 3 variables at the top of the dashboard:
![image](https://user-images.githubusercontent.com/2803622/179366574-1bb83e30-05c6-4558-9e66-e813e85b3a6e.png)
Select the Prometheus with your Hookshot metrics as Data Source. Set Interval to your scraping interval. Set 2x Interval to twice the Interval value ([why?](https://github.com/matrix-org/matrix-hookshot/pull/407#issuecomment-1186251618)).
Below is the generated list of Prometheus metrics for Hookshot.
## hookshot
| Metric | Help | Labels |
|--------|------|--------|
| hookshot_webhooks_http_request | Number of requests made to the hookshot webhooks handler | path, method |
| hookshot_provisioning_http_request | Number of requests made to the hookshot webhooks handler | path, method |
| hookshot_queue_event_pushes | Number of events pushed through the queue | event |
| hookshot_connection_event_failed | The number of events that failed to process | event, connectionId |
| hookshot_connections | The number of active hookshot connections | service |
| hookshot_notifications_push | Number of notifications pushed | service |
| hookshot_notifications_service_up | Is the notification service up or down | service |
| hookshot_notifications_watchers | Number of notifications watchers running | service |
## matrix
| Metric | Help | Labels |
|--------|------|--------|
| matrix_api_calls | The number of Matrix client API calls made | method |
| matrix_api_calls_failed | The number of Matrix client API calls which failed | method |
| matrix_appservice_events | The number of events sent over the AS API | |
| matrix_appservice_decryption_failed | The number of events sent over the AS API that failed to decrypt | |
## feed
| Metric | Help | Labels |
|--------|------|--------|
| feed_count | The number of RSS feeds that hookshot is subscribed to | |
| feed_fetch_ms | The time taken for hookshot to fetch all feeds | |
| feed_failing | The number of RSS feeds that hookshot is failing to read | reason |
## process
| Metric | Help | Labels |
|--------|------|--------|
| process_cpu_user_seconds_total | Total user CPU time spent in seconds. | |
| process_cpu_system_seconds_total | Total system CPU time spent in seconds. | |
| process_cpu_seconds_total | Total user and system CPU time spent in seconds. | |
| process_start_time_seconds | Start time of the process since unix epoch in seconds. | |
| process_resident_memory_bytes | Resident memory size in bytes. | |
| process_virtual_memory_bytes | Virtual memory size in bytes. | |
| process_heap_bytes | Process heap size in bytes. | |
| process_open_fds | Number of open file descriptors. | |
| process_max_fds | Maximum number of open file descriptors. | |
## nodejs
| Metric | Help | Labels |
|--------|------|--------|
| nodejs_eventloop_lag_seconds | Lag of event loop in seconds. | |
| nodejs_eventloop_lag_min_seconds | The minimum recorded event loop delay. | |
| nodejs_eventloop_lag_max_seconds | The maximum recorded event loop delay. | |
| nodejs_eventloop_lag_mean_seconds | The mean of the recorded event loop delays. | |
| nodejs_eventloop_lag_stddev_seconds | The standard deviation of the recorded event loop delays. | |
| nodejs_eventloop_lag_p50_seconds | The 50th percentile of the recorded event loop delays. | |
| nodejs_eventloop_lag_p90_seconds | The 90th percentile of the recorded event loop delays. | |
| nodejs_eventloop_lag_p99_seconds | The 99th percentile of the recorded event loop delays. | |
| nodejs_active_handles | Number of active libuv handles grouped by handle type. Every handle type is C++ class name. | type |
| nodejs_active_handles_total | Total number of active handles. | |
| nodejs_active_requests | Number of active libuv requests grouped by request type. Every request type is C++ class name. | type |
| nodejs_active_requests_total | Total number of active requests. | |
| nodejs_heap_size_total_bytes | Process heap size from Node.js in bytes. | |
| nodejs_heap_size_used_bytes | Process heap size used from Node.js in bytes. | |
| nodejs_external_memory_bytes | Node.js external memory size in bytes. | |
| nodejs_heap_space_size_total_bytes | Process heap space size total from Node.js in bytes. | space |
| nodejs_heap_space_size_used_bytes | Process heap space size used from Node.js in bytes. | space |
| nodejs_heap_space_size_available_bytes | Process heap space size available from Node.js in bytes. | space |
| nodejs_version_info | Node.js version info. | version, major, minor, patch |
| nodejs_gc_duration_seconds | Garbage collection duration by kind, one of major, minor, incremental or weakcb. | kind |