Skip to content

    Overview

    Pods’ progress metrics

    Every backup pod or restore job offer metrics in the Prometheus format about the current state of the process. They can be queried with a GET request on the /metrics endpoint on port 9000.

    Here’s an example of a backup’s metrics
    curl localhost:9000/metrics
    # HELP gauge Total number of bytes present in the storage (affected by chosen compression algorithm)
    # TYPE backup_sink_bytes_count gauge
    backup_sink_bytes_count{topic="std-1"} 1750738
    backup_sink_bytes_count{topic="std-2"} 1324392
    # HELP counter Total number of bytes present in the storage, including expired segments (affected by chosen compression algorithm)
    # TYPE backup_sink_bytes_count_total counter
    backup_sink_bytes_count_total{topic="std-1"} 1750738
    backup_sink_bytes_count_total{topic="std-2"} 1324392
    # HELP counter Total number of records sent to the storage since program start
    # TYPE backup_sink_jobs_count counter
    backup_sink_jobs_count{topic="std-1"} 753
    backup_sink_jobs_count{topic="std-2"} 376
    # HELP gauge Indicator of time spent idle: 0=fully busy, 1=fully idle
    # TYPE backup_sink_jobs_idleratio gauge
    backup_sink_jobs_idleratio{topic="std-1"} 0.9982043009466929
    backup_sink_jobs_idleratio{topic="std-2"} 0.9987926299963017
    # HELP counter Last offset backed-up for each topic's partition
    # TYPE backup_sink_offset counter
    backup_sink_offset{topic="std-1",partition="0"} 388
    backup_sink_offset{topic="std-1",partition="1"} 284
    backup_sink_offset{topic="std-1",partition="2"} 272
    backup_sink_offset{topic="std-2",partition="2"} 168
    backup_sink_offset{topic="std-2",partition="1"} 146
    backup_sink_offset{topic="std-2",partition="0"} 161
    # HELP gauge Total number of records present in the storage sink.
    # TYPE backup_sink_records_count gauge
    backup_sink_records_count{topic="std-1"} 947
    backup_sink_records_count{topic="std-2"} 478
    # HELP counter Total number of records written to the storage sink, including expired segments
    # TYPE backup_sink_records_count_total counter
    backup_sink_records_count_total{topic="std-1"} 947
    backup_sink_records_count_total{topic="std-2"} 478
    # HELP counter Total number of records filtered out (user-defined filters, plugins, ...)
    # TYPE backup_sink_records_filtered_count counter
    backup_sink_records_filtered_count{topic="std-1"} 0
    backup_sink_records_filtered_count{topic="std-2"} 0
    # HELP gauge Total size of backup'd records (key + headers + payload)
    # TYPE backup_sink_records_size gauge
    backup_sink_records_size{topic="std-1"} 11328438
    backup_sink_records_size{topic="std-2"} 5968210
    # HELP counter Total size of backup'd records (key + headers + payload), including expired segments
    # TYPE backup_sink_records_size_total counter
    backup_sink_records_size_total{topic="std-1"} 11328438
    backup_sink_records_size_total{topic="std-2"} 5968210
    # HELP gauge Number of segments (.kan files) written to the storage
    # TYPE backup_sink_segments_count gauge
    backup_sink_segments_count{topic="std-1"} 2
    backup_sink_segments_count{topic="std-2"} 2
    # HELP counter Number of segments (.kan files) written to the storage, including expired segments
    # TYPE backup_sink_segments_count_total counter
    backup_sink_segments_count_total{topic="std-1"} 2
    backup_sink_segments_count_total{topic="std-2"} 2
    # HELP gauge State of this topic's backup task: 0=Created, 1xx=Running, 2xx=Paused, 3xx=Backoff, 8xx=Done, 9xx=Failed
    # TYPE backup_state gauge
    backup_state{topic="std-1"} 100
    backup_state{topic="std-2"} 100
    # HELP gauge Progress of a topic's partition backup as a number between 0 and 1
    # TYPE backup_topic_partition_progress gauge
    backup_topic_partition_progress{topic="std-1",partition="1"} 0.9665551839464883
    backup_topic_partition_progress{topic="std-1",partition="2"} 0.9581881533101045
    backup_topic_partition_progress{topic="std-1",partition="0"} 0.9607843137254902
    backup_topic_partition_progress{topic="std-2",partition="2"} 0.949438202247191
    backup_topic_partition_progress{topic="std-2",partition="0"} 0.9642857142857143
    backup_topic_partition_progress{topic="std-2",partition="1"} 0.9735099337748344
    # HELP gauge Overall progress of a topic's backup as a number between 0 and 1
    # TYPE backup_topic_progress gauge
    backup_topic_progress{topic="std-1"} 0.961842550327361
    backup_topic_progress{topic="std-2"} 0.9624112834359133
    # HELP jobs_state Number of jobs in a given running state
    # TYPE jobs_state gauge
    jobs_state{state="Created"} 0
    jobs_state{state="Done"} 0
    jobs_state{state="Running"} 2
    jobs_state{state="Paused"} 0
    jobs_state{state="Failed"} 0
    jobs_state{state="Backoff"} 0

    Push metrics

    Backup and Restore jobs (including schema registry backups & restores) push their metrics to the API so that the Armory console can display the progress being made.

    Metrics are sent to the event-gateway Kubernetes service which listens on port 8082.

    By default, pushing metrics will be configured and enabled when using the main Helm chart. It can be disabled by setting the operator.config.eventGateway.enabled flag to false.

    The following properties can be used to change the event gateway service configuration which points to the API:

    values.yaml
    api:
    eventGateway:
    enabled: true
    service:
    name: "event-gateway"
    type: ClusterIP
    port: 8082 # port that the service will expose
    targetPort: 8082 # container port
    annotations: { }
    labels: { }

    To change where restores push the metrics to, update the operator.config.eventGateway.service properties:

    values.yaml
    operator:
    config:
    eventGateway:
    enabled: true
    service:
    name: event-gateway
    namespace: ""
    port: 8082