hookshot/docs/metrics.md

82 lines
4.9 KiB
Markdown
Raw Normal View History

Prometheus Metrics
==================
You can configure metrics support by adding the following to your config:
```yaml
2022-04-05 09:10:19 +01:00
metrics:
enabled: true
bindAddress: 127.0.0.1
port: 9002
```
An example dashboard that can be used with [Grafana](https://grafana.com) can be found at [/contrib/hookshot-dashboard.json](https://github.com/matrix-org/matrix-hookshot/blob/main/contrib/hookshot-dashboard.json).
There are 3 variables at the top of the dashboard:
![image](https://user-images.githubusercontent.com/2803622/179366574-1bb83e30-05c6-4558-9e66-e813e85b3a6e.png)
Select the Prometheus with your Hookshot metrics as Data Source. Set Interval to your scraping interval. Set 2x Interval to twice the Interval value ([why?](https://github.com/matrix-org/matrix-hookshot/pull/407#issuecomment-1186251618)).
Below is the generated list of Prometheus metrics for Hookshot.
## hookshot
| Metric | Help | Labels |
|--------|------|--------|
| hookshot_webhooks_http_request | Number of requests made to the hookshot webhooks handler | path, method |
| hookshot_provisioning_http_request | Number of requests made to the hookshot webhooks handler | path, method |
| hookshot_queue_event_pushes | Number of events pushed through the queue | event |
| hookshot_connection_event_failed | The number of events that failed to process | event, connectionId |
| hookshot_connections | The number of active hookshot connections | service |
| hookshot_notifications_push | Number of notifications pushed | service |
| hookshot_notifications_service_up | Is the notification service up or down | service |
| hookshot_notifications_watchers | Number of notifications watchers running | service |
## matrix
| Metric | Help | Labels |
|--------|------|--------|
| matrix_api_calls | The number of Matrix client API calls made | method |
| matrix_api_calls_failed | The number of Matrix client API calls which failed | method |
| matrix_appservice_events | The number of events sent over the AS API | |
Add support for native e2ee (#299) * Add support for native e2ee * Various temps to coax it into working * Formatting nitpicks * Include stable registration config key for msc2409 * Update default config with encryption options * Manage admin rooms with bot-sdk DMs This also enables encryption for new admin rooms when appropriate. * Update config comments for encryption settings - Add comment to clarify Redis (the `queue` section) must be configured in order for encryption to work - Mention that the `encryption` section is optional, and omitting it will disable encryption support * Update docs for encryption support * Add changelog * Add to docs some notes about encryption state * Move all post-join logic to onRoomJoin * Block post-join actions on crypto setup Requires https://github.com/turt2live/matrix-bot-sdk/pull/269 * Fix linter error * Update encryption docs and changelog - Mention that worker mode isn't supported with encryption yet - Mention removal of Pantalaimon-based encryption * Update worker docs with encryption config notice * Share main appservice config with feed bots This is required to safely enable encryption for the bots that post GenericHook messages. * Make slight clarification for queue config * Minor fixes * Block post-join actions on feed bot crypto setup Same as a9e6e11d but for the sub-bots that post GenericHook messages. * Get joined rooms from intent instead of bot This refreshes the list of known rooms for crypto events. * Use Element fork of bot-sdk for crypto fixes Co-authored-by: Andrew Ferrazzutti <andrewf@element.io>
2022-12-09 15:25:36 +00:00
| matrix_appservice_decryption_failed | The number of events sent over the AS API that failed to decrypt | |
2022-04-22 18:53:01 +01:00
## feed
| Metric | Help | Labels |
|--------|------|--------|
2022-04-22 18:54:42 +01:00
| feed_count | The number of RSS feeds that hookshot is subscribed to | |
2022-04-22 18:53:01 +01:00
| feed_fetch_ms | The time taken for hookshot to fetch all feeds | |
| feed_failing | The number of RSS feeds that hookshot is failing to read | reason |
## process
| Metric | Help | Labels |
|--------|------|--------|
| process_cpu_user_seconds_total | Total user CPU time spent in seconds. | |
| process_cpu_system_seconds_total | Total system CPU time spent in seconds. | |
| process_cpu_seconds_total | Total user and system CPU time spent in seconds. | |
| process_start_time_seconds | Start time of the process since unix epoch in seconds. | |
| process_resident_memory_bytes | Resident memory size in bytes. | |
| process_virtual_memory_bytes | Virtual memory size in bytes. | |
| process_heap_bytes | Process heap size in bytes. | |
| process_open_fds | Number of open file descriptors. | |
| process_max_fds | Maximum number of open file descriptors. | |
## nodejs
| Metric | Help | Labels |
|--------|------|--------|
| nodejs_eventloop_lag_seconds | Lag of event loop in seconds. | |
| nodejs_eventloop_lag_min_seconds | The minimum recorded event loop delay. | |
| nodejs_eventloop_lag_max_seconds | The maximum recorded event loop delay. | |
| nodejs_eventloop_lag_mean_seconds | The mean of the recorded event loop delays. | |
| nodejs_eventloop_lag_stddev_seconds | The standard deviation of the recorded event loop delays. | |
| nodejs_eventloop_lag_p50_seconds | The 50th percentile of the recorded event loop delays. | |
| nodejs_eventloop_lag_p90_seconds | The 90th percentile of the recorded event loop delays. | |
| nodejs_eventloop_lag_p99_seconds | The 99th percentile of the recorded event loop delays. | |
| nodejs_active_handles | Number of active libuv handles grouped by handle type. Every handle type is C++ class name. | type |
| nodejs_active_handles_total | Total number of active handles. | |
| nodejs_active_requests | Number of active libuv requests grouped by request type. Every request type is C++ class name. | type |
| nodejs_active_requests_total | Total number of active requests. | |
| nodejs_heap_size_total_bytes | Process heap size from Node.js in bytes. | |
| nodejs_heap_size_used_bytes | Process heap size used from Node.js in bytes. | |
| nodejs_external_memory_bytes | Node.js external memory size in bytes. | |
| nodejs_heap_space_size_total_bytes | Process heap space size total from Node.js in bytes. | space |
| nodejs_heap_space_size_used_bytes | Process heap space size used from Node.js in bytes. | space |
| nodejs_heap_space_size_available_bytes | Process heap space size available from Node.js in bytes. | space |
| nodejs_version_info | Node.js version info. | version, major, minor, patch |
| nodejs_gc_duration_seconds | Garbage collection duration by kind, one of major, minor, incremental or weakcb. | kind |