Metrics
Every restore pod offers metrics about the current state of the restore.
The metrics are available on the /metrics
endpoint of the pods on port 9000
and are offered in the Prometheus format.
Available metrics
Name | Type | Description |
---|---|---|
restore_progress | Gauge | A restore job’s progress represented as a float between 0 and 1. |
restore_records_count | Counter | Number of records successfully restored up to now. |
restore_records_total | Counter | Total number of records to be restored. |
restore_bytes_count | Counter | Number of payload bytes successfully restored up to now. |
restore_bytes_total | Counter | Total number of bytes to be restored. |
restore_records_filtered_count | Counter | Number of records filtered-out (by filters or plugins). Those were not restored on the target cluster. |
restore_partitions_count | Counter | Number of Kannika partitions restored up to now. |
restore_partitions_total | Counter | Total number of Kannika partitions to be restored. |
restore_jobs_errors | Counter | Number of restore jobs with error. |
Update interval
The metrics are updated every second.
Push metrics
The restore jobs push their metrics when a topic restore is started/stopped and periodically.
This is mainly intended for the API, so it can store historical data for when the restore is done or paused.
Metrics will be pushed to the event-gateway
Kubernetes service which listens on port 8082
.
The restore jobs will post their metrics to that service for each topic using the following paths:
/namespaces/{namespace}/restores/{restoreName}/{restoreUuid}/topics/{topicName}/started
/namespaces/{namespace}/restores/{restoreName}/{restoreUuid}/topics/{topicName}/stopped
Configuration
By default, pushing metrics will be configured and enabled when using the main Helm chart.
It can be disabled by setting the operator.config.eventGateway.enabled
flag to false
.
The following properties can be used to change the event gateway service configuration which points to the API:
To change where restores push the metrics to,
update the operator.config.eventGateway.service
properties: