Worker Groups
There are situations where a single pod would struggle keeping up with backing up the full set of topics defined in a Backup.
It is possible to spread the load on multiple pods and let Kubernetes schedule them on available nodes using the spec.workGroup
setting:
apiVersion: kannika.io/v1alphakind: Backupmetadata: name: backup-example labels: io.kannika/data-retention-policy: Keepspec: source: "my-eventhub" sink: "my-storage" workGroup: workers: 3 topicSelectors: matchers: - name: glob: "flights.*"
In the above example, we’ve defined a Backup that will match all topics starting with ‘flights.’ which in our case happens to be too many topics for a single node to handle.
To spread the workload on multiple pods,
we’ve set the workGroup.workers
value to 3.
This will create 3 backup pods that will be assigned a subset of matched topics.
Changing the distribution of backup jobs
Armory doesn’t currently try to distribute large topics evenly across nodes. The algorithm relies on hashing topic names to choose where the job will run, which produces wildly different outputs even for similar looking topic names and, on average, satisfying results.
However, should you wish to manually change the distribution of jobs, you can provide a seed to the algorithm as shown below:
apiVersion: kannika.io/v1alphakind: Backupmetadata: name: backup-example labels: io.kannika/data-retention-policy: Keepspec: source: "my-eventhub" sink: "my-storage" workGroup: workers: 3 seed: 42 topicSelectors: matchers: - name: glob: "flights.*"
The spec.workGroup.seed
is a positive integer value that will be fed to the algorithm responsible for distributing jobs across pods.
Changing the seed will alter its output,
resulting in a seemingly random, alternate distribution.
Autoscaling the number of workers
Armory currently doesn’t support autoscaling the number of workers e.g. by using a Horizontal Pod Autoscaler (HPA) . This feature is planned for a future release.