Overview
Pods’ progress metrics
Every backup pod or restore job offer metrics in the Prometheus format about the current state of the process.
They can be queried with a GET request on the /metrics endpoint on port 9000.
Here’s an example of a backup’s metrics
curl localhost:9000/metrics# HELP gauge Total number of bytes present in the storage (affected by chosen compression algorithm)# TYPE backup_sink_bytes_count gaugebackup_sink_bytes_count{topic="std-1"} 1750738backup_sink_bytes_count{topic="std-2"} 1324392
# HELP counter Total number of bytes present in the storage, including expired segments (affected by chosen compression algorithm)# TYPE backup_sink_bytes_count_total counterbackup_sink_bytes_count_total{topic="std-1"} 1750738backup_sink_bytes_count_total{topic="std-2"} 1324392
# HELP counter Total number of records sent to the storage since program start# TYPE backup_sink_jobs_count counterbackup_sink_jobs_count{topic="std-1"} 753backup_sink_jobs_count{topic="std-2"} 376
# HELP gauge Indicator of time spent idle: 0=fully busy, 1=fully idle# TYPE backup_sink_jobs_idleratio gaugebackup_sink_jobs_idleratio{topic="std-1"} 0.9982043009466929backup_sink_jobs_idleratio{topic="std-2"} 0.9987926299963017
# HELP counter Last offset backed-up for each topic's partition# TYPE backup_sink_offset counterbackup_sink_offset{topic="std-1",partition="0"} 388backup_sink_offset{topic="std-1",partition="1"} 284backup_sink_offset{topic="std-1",partition="2"} 272backup_sink_offset{topic="std-2",partition="2"} 168backup_sink_offset{topic="std-2",partition="1"} 146backup_sink_offset{topic="std-2",partition="0"} 161
# HELP gauge Total number of records present in the storage sink.# TYPE backup_sink_records_count gaugebackup_sink_records_count{topic="std-1"} 947backup_sink_records_count{topic="std-2"} 478
# HELP counter Total number of records written to the storage sink, including expired segments# TYPE backup_sink_records_count_total counterbackup_sink_records_count_total{topic="std-1"} 947backup_sink_records_count_total{topic="std-2"} 478
# HELP counter Total number of records filtered out (user-defined filters, plugins, ...)# TYPE backup_sink_records_filtered_count counterbackup_sink_records_filtered_count{topic="std-1"} 0backup_sink_records_filtered_count{topic="std-2"} 0
# HELP gauge Total size of backup'd records (key + headers + payload)# TYPE backup_sink_records_size gaugebackup_sink_records_size{topic="std-1"} 11328438backup_sink_records_size{topic="std-2"} 5968210
# HELP counter Total size of backup'd records (key + headers + payload), including expired segments# TYPE backup_sink_records_size_total counterbackup_sink_records_size_total{topic="std-1"} 11328438backup_sink_records_size_total{topic="std-2"} 5968210
# HELP gauge Number of segments (.kan files) written to the storage# TYPE backup_sink_segments_count gaugebackup_sink_segments_count{topic="std-1"} 2backup_sink_segments_count{topic="std-2"} 2
# HELP counter Number of segments (.kan files) written to the storage, including expired segments# TYPE backup_sink_segments_count_total counterbackup_sink_segments_count_total{topic="std-1"} 2backup_sink_segments_count_total{topic="std-2"} 2
# HELP gauge State of this topic's backup task: 0=Created, 1xx=Running, 2xx=Paused, 3xx=Backoff, 8xx=Done, 9xx=Failed# TYPE backup_state gaugebackup_state{topic="std-1"} 100backup_state{topic="std-2"} 100
# HELP gauge Progress of a topic's partition backup as a number between 0 and 1# TYPE backup_topic_partition_progress gaugebackup_topic_partition_progress{topic="std-1",partition="1"} 0.9665551839464883backup_topic_partition_progress{topic="std-1",partition="2"} 0.9581881533101045backup_topic_partition_progress{topic="std-1",partition="0"} 0.9607843137254902backup_topic_partition_progress{topic="std-2",partition="2"} 0.949438202247191backup_topic_partition_progress{topic="std-2",partition="0"} 0.9642857142857143backup_topic_partition_progress{topic="std-2",partition="1"} 0.9735099337748344
# HELP gauge Overall progress of a topic's backup as a number between 0 and 1# TYPE backup_topic_progress gaugebackup_topic_progress{topic="std-1"} 0.961842550327361backup_topic_progress{topic="std-2"} 0.9624112834359133
# HELP jobs_state Number of jobs in a given running state# TYPE jobs_state gaugejobs_state{state="Created"} 0jobs_state{state="Done"} 0jobs_state{state="Running"} 2jobs_state{state="Paused"} 0jobs_state{state="Failed"} 0jobs_state{state="Backoff"} 0Push metrics
Backup and Restore jobs (including schema registry backups & restores) push their metrics to the API so that the Armory console can display the progress being made.
Metrics are sent to the event-gateway Kubernetes service which listens on port 8082.
By default, pushing metrics will be configured and enabled when using the main Helm chart.
It can be disabled by setting the operator.config.eventGateway.enabled flag to false.
The following properties can be used to change the event gateway service configuration which points to the API:
api: eventGateway: enabled: true service: name: "event-gateway" type: ClusterIP port: 8082 # port that the service will expose targetPort: 8082 # container port annotations: { } labels: { }To change where restores push the metrics to,
update the operator.config.eventGateway.service properties:
operator: config: eventGateway: enabled: true service: name: event-gateway namespace: "" port: 8082