Skip to content

    Worker Groups

    There are situations where a single pod would struggle keeping up with backing up the full set of topics defined in a Backup. It is possible to spread the load on multiple pods and let Kubernetes schedule them on available nodes using the spec.workGroup setting:

    apiVersion: kannika.io/v1alpha
    kind: Backup
    metadata:
    name: backup-example
    labels:
    io.kannika/data-retention-policy: Keep
    spec:
    source: "my-eventhub"
    sink: "my-storage"
    workGroup:
    workers: 3
    topicSelectors:
    matchers:
    - name:
    glob: "flights.*"

    In the above example, we’ve defined a Backup that will match all topics starting with ‘flights.’ which in our case happens to be too many topics for a single node to handle.

    To spread the workload on multiple pods, we’ve set the workGroup.workers value to 3. This will create 3 backup pods that will be assigned a subset of matched topics.

    Changing the distribution of backup jobs

    Armory doesn’t currently try to distribute large topics evenly across nodes. The algorithm relies on hashing topic names to choose where the job will run, which produces wildly different outputs even for similar looking topic names and, on average, satisfying results.

    However, should you wish to manually change the distribution of jobs, you can provide a seed to the algorithm as shown below:

    apiVersion: kannika.io/v1alpha
    kind: Backup
    metadata:
    name: backup-example
    labels:
    io.kannika/data-retention-policy: Keep
    spec:
    source: "my-eventhub"
    sink: "my-storage"
    workGroup:
    workers: 3
    seed: 42
    topicSelectors:
    matchers:
    - name:
    glob: "flights.*"

    The spec.workGroup.seed is a positive integer value that will be fed to the algorithm responsible for distributing jobs across pods. Changing the seed will alter its output, resulting in a seemingly random, alternate distribution.

    Autoscaling the number of workers

    Armory currently doesn’t support autoscaling the number of workers e.g. by using a Horizontal Pod Autoscaler (HPA) . This feature is planned for a future release.