Overview
Pods’ progress metrics
Every backup pod or restore job offer metrics in the Prometheus format about the current state of the process.
They can be queried with a GET request on the /metrics
endpoint on port 9000
.
Here’s an example of a backup’s metrics
curl localhost:9000/metrics# HELP gauge Total number of bytes present in the storage (affected by chosen compression algorithm)# TYPE backup_sink_bytes_count gaugebackup_sink_bytes_count{topic="std-1"} 1750738backup_sink_bytes_count{topic="std-2"} 1324392
# HELP counter Total number of bytes present in the storage, including expired segments (affected by chosen compression algorithm)# TYPE backup_sink_bytes_count_total counterbackup_sink_bytes_count_total{topic="std-1"} 1750738backup_sink_bytes_count_total{topic="std-2"} 1324392
# HELP counter Total number of records sent to the storage since program start# TYPE backup_sink_jobs_count counterbackup_sink_jobs_count{topic="std-1"} 753backup_sink_jobs_count{topic="std-2"} 376
# HELP gauge Indicator of time spent idle: 0=fully busy, 1=fully idle# TYPE backup_sink_jobs_idleratio gaugebackup_sink_jobs_idleratio{topic="std-1"} 0.9982043009466929backup_sink_jobs_idleratio{topic="std-2"} 0.9987926299963017
# HELP counter Last offset backed-up for each topic's partition# TYPE backup_sink_offset counterbackup_sink_offset{topic="std-1",partition="0"} 388backup_sink_offset{topic="std-1",partition="1"} 284backup_sink_offset{topic="std-1",partition="2"} 272backup_sink_offset{topic="std-2",partition="2"} 168backup_sink_offset{topic="std-2",partition="1"} 146backup_sink_offset{topic="std-2",partition="0"} 161
# HELP gauge Total number of records present in the storage sink.# TYPE backup_sink_records_count gaugebackup_sink_records_count{topic="std-1"} 947backup_sink_records_count{topic="std-2"} 478
# HELP counter Total number of records written to the storage sink, including expired segments# TYPE backup_sink_records_count_total counterbackup_sink_records_count_total{topic="std-1"} 947backup_sink_records_count_total{topic="std-2"} 478
# HELP counter Total number of records filtered out (user-defined filters, plugins, ...)# TYPE backup_sink_records_filtered_count counterbackup_sink_records_filtered_count{topic="std-1"} 0backup_sink_records_filtered_count{topic="std-2"} 0
# HELP gauge Total size of backup'd records (key + headers + payload)# TYPE backup_sink_records_size gaugebackup_sink_records_size{topic="std-1"} 11328438backup_sink_records_size{topic="std-2"} 5968210
# HELP counter Total size of backup'd records (key + headers + payload), including expired segments# TYPE backup_sink_records_size_total counterbackup_sink_records_size_total{topic="std-1"} 11328438backup_sink_records_size_total{topic="std-2"} 5968210
# HELP gauge Number of segments (.kan files) written to the storage# TYPE backup_sink_segments_count gaugebackup_sink_segments_count{topic="std-1"} 2backup_sink_segments_count{topic="std-2"} 2
# HELP counter Number of segments (.kan files) written to the storage, including expired segments# TYPE backup_sink_segments_count_total counterbackup_sink_segments_count_total{topic="std-1"} 2backup_sink_segments_count_total{topic="std-2"} 2
# HELP gauge State of this topic's backup task: 0=Created, 1xx=Running, 2xx=Paused, 3xx=Backoff, 8xx=Done, 9xx=Failed# TYPE backup_state gaugebackup_state{topic="std-1"} 100backup_state{topic="std-2"} 100
# HELP gauge Progress of a topic's partition backup as a number between 0 and 1# TYPE backup_topic_partition_progress gaugebackup_topic_partition_progress{topic="std-1",partition="1"} 0.9665551839464883backup_topic_partition_progress{topic="std-1",partition="2"} 0.9581881533101045backup_topic_partition_progress{topic="std-1",partition="0"} 0.9607843137254902backup_topic_partition_progress{topic="std-2",partition="2"} 0.949438202247191backup_topic_partition_progress{topic="std-2",partition="0"} 0.9642857142857143backup_topic_partition_progress{topic="std-2",partition="1"} 0.9735099337748344
# HELP gauge Overall progress of a topic's backup as a number between 0 and 1# TYPE backup_topic_progress gaugebackup_topic_progress{topic="std-1"} 0.961842550327361backup_topic_progress{topic="std-2"} 0.9624112834359133
# HELP jobs_state Number of jobs in a given running state# TYPE jobs_state gaugejobs_state{state="Created"} 0jobs_state{state="Done"} 0jobs_state{state="Running"} 2jobs_state{state="Paused"} 0jobs_state{state="Failed"} 0jobs_state{state="Backoff"} 0
Push metrics
Backup and Restore jobs (including schema registry backups & restores) push their metrics to the API so that the Armory console can display the progress being made.
Metrics are sent to the event-gateway
Kubernetes service which listens on port 8082
.
By default, pushing metrics will be configured and enabled when using the main Helm chart.
It can be disabled by setting the operator.config.eventGateway.enabled
flag to false
.
The following properties can be used to change the event gateway service configuration which points to the API:
api: eventGateway: enabled: true service: name: "event-gateway" type: ClusterIP port: 8082 # port that the service will expose targetPort: 8082 # container port annotations: { } labels: { }
To change where restores push the metrics to,
update the operator.config.eventGateway.service
properties:
operator: config: eventGateway: enabled: true service: name: event-gateway namespace: "" port: 8082