hookshot/docs/metrics.md
Will Hunt c962f17a91
Add support for native e2ee (#299)
* Add support for native e2ee

* Various temps to coax it into working

* Formatting nitpicks

* Include stable registration config key for msc2409

* Update default config with encryption options

* Manage admin rooms with bot-sdk DMs

This also enables encryption for new admin rooms when appropriate.

* Update config comments for encryption settings

- Add comment to clarify Redis (the `queue` section) must be configured
  in order for encryption to work
- Mention that the `encryption` section is optional, and omitting it
  will disable encryption support

* Update docs for encryption support

* Add changelog

* Add to docs some notes about encryption state

* Move all post-join logic to onRoomJoin

* Block post-join actions on crypto setup

Requires https://github.com/turt2live/matrix-bot-sdk/pull/269

* Fix linter error

* Update encryption docs and changelog

- Mention that worker mode isn't supported with encryption yet
- Mention removal of Pantalaimon-based encryption

* Update worker docs with encryption config notice

* Share main appservice config with feed bots

This is required to safely enable encryption for the bots that post
GenericHook messages.

* Make slight clarification for queue config

* Minor fixes

* Block post-join actions on feed bot crypto setup

Same as a9e6e11d but for the sub-bots that post GenericHook messages.

* Get joined rooms from intent instead of bot

This refreshes the list of known rooms for crypto events.

* Use Element fork of bot-sdk for crypto fixes

Co-authored-by: Andrew Ferrazzutti <andrewf@element.io>
2022-12-09 10:25:36 -05:00

4.8 KiB

Prometheus Metrics

You can configure metrics support by adding the following to your config:

metrics:
  enabled: true
  bindAddress: 127.0.0.1
  port: 9002

An example dashboard that can be used with Grafana can be found at /contrib/hookshot-dashboard.json. There are 3 variables at the top of the dashboard:

image

Select the Prometheus with your Hookshot metrics as Data Source. Set Interval to your scraping interval. Set 2x Interval to twice the Interval value (why?).

Below is the generated list of Prometheus metrics for Hookshot.

hookshot

Metric Help Labels
hookshot_webhooks_http_request Number of requests made to the hookshot webhooks handler path, method
hookshot_provisioning_http_request Number of requests made to the hookshot webhooks handler path, method
hookshot_queue_event_pushes Number of events pushed through the queue event
hookshot_connection_event_failed The number of events that failed to process event, connectionId
hookshot_connections The number of active hookshot connections service
hookshot_notifications_push Number of notifications pushed service
hookshot_notifications_service_up Is the notification service up or down service
hookshot_notifications_watchers Number of notifications watchers running service

matrix

Metric Help Labels
matrix_api_calls The number of Matrix client API calls made method
matrix_api_calls_failed The number of Matrix client API calls which failed method
matrix_appservice_events The number of events sent over the AS API
matrix_appservice_decryption_failed The number of events sent over the AS API that failed to decrypt

feed

Metric Help Labels
feed_count The number of RSS feeds that hookshot is subscribed to
feed_fetch_ms The time taken for hookshot to fetch all feeds

process

Metric Help Labels
process_cpu_user_seconds_total Total user CPU time spent in seconds.
process_cpu_system_seconds_total Total system CPU time spent in seconds.
process_cpu_seconds_total Total user and system CPU time spent in seconds.
process_start_time_seconds Start time of the process since unix epoch in seconds.
process_resident_memory_bytes Resident memory size in bytes.
process_virtual_memory_bytes Virtual memory size in bytes.
process_heap_bytes Process heap size in bytes.
process_open_fds Number of open file descriptors.
process_max_fds Maximum number of open file descriptors.

nodejs

Metric Help Labels
nodejs_eventloop_lag_seconds Lag of event loop in seconds.
nodejs_eventloop_lag_min_seconds The minimum recorded event loop delay.
nodejs_eventloop_lag_max_seconds The maximum recorded event loop delay.
nodejs_eventloop_lag_mean_seconds The mean of the recorded event loop delays.
nodejs_eventloop_lag_stddev_seconds The standard deviation of the recorded event loop delays.
nodejs_eventloop_lag_p50_seconds The 50th percentile of the recorded event loop delays.
nodejs_eventloop_lag_p90_seconds The 90th percentile of the recorded event loop delays.
nodejs_eventloop_lag_p99_seconds The 99th percentile of the recorded event loop delays.
nodejs_active_handles Number of active libuv handles grouped by handle type. Every handle type is C++ class name. type
nodejs_active_handles_total Total number of active handles.
nodejs_active_requests Number of active libuv requests grouped by request type. Every request type is C++ class name. type
nodejs_active_requests_total Total number of active requests.
nodejs_heap_size_total_bytes Process heap size from Node.js in bytes.
nodejs_heap_size_used_bytes Process heap size used from Node.js in bytes.
nodejs_external_memory_bytes Node.js external memory size in bytes.
nodejs_heap_space_size_total_bytes Process heap space size total from Node.js in bytes. space
nodejs_heap_space_size_used_bytes Process heap space size used from Node.js in bytes. space
nodejs_heap_space_size_available_bytes Process heap space size available from Node.js in bytes. space
nodejs_version_info Node.js version info. version, major, minor, patch
nodejs_gc_duration_seconds Garbage collection duration by kind, one of major, minor, incremental or weakcb. kind