Overview

A Backup is used for backing up topics from an EventHub, and offloading them to Storage. It is configured by creating a Backup resource in Kubernetes. The Kannika Armory Operator creates a StatefulSet that runs the backup process, based on the configuration in the Backup resource.

A Backup can have multiple Backup Streams which are used to configure which topics should be backed up.

Usage

Backups can be managed using the kubectl command line tool, and are available by the name backup or backups.

$ kubectl get backups
NAME         STATUS         AGE
my-backup    🚀 Streaming   1s

Backup Status

A Backup can have the following statuses:

Draft The Backup has no streams defined, and is not ready to be started.
Paused The Backup is configured but it has not been started yet or it has been paused, and no data is being backed up.
Initializing The Backup process is being created
Streaming The Backup is running and backing up data to the storage.
Error The Backup has failed, and no data is being backed up.

Additionally, a Backup resource exposes a DeploymentReady Condition to report on the state of the underlying StatefulSet. This condition’s value can either be ‘True’ or ‘False’, indicating whether this Backup’s deployment is healthy. The condition’s reason gives additional context and may be one of the following:

DeploymentReady the deployment is healthy;
DeploymentError the deployment encountered an error: check the status of this deployment to find out what happened;
DeploymentDeleted somebody or something deleted the deployment and this should be a transient state;
DeploymentStateUnknown the deployment state is unknown, perhaps due to an error talking with the kubernetes API.

Configuring a Backup

The following is an example of a Backup. It configures a Backup which will back up two topics from the my-kafka-cluster EventHub to the my-bucket Storage.

apiVersion: kannika.io/v1alpha
kind: Backup
metadata:
  name: backup-example
spec:
  source: "my-kafka-cluster"
  sink: "my-storage"
  streams:
     - topic: "magic.events"
     - topic: "pixie.dust"

In this example:

A Backup named backup-example is created, indicated by the .metadata.name field. This name will become the basis for the StatefulSet which is created for this Backup.
The Backup will connect to the my-kafka-cluster EventHub to fetch data, indicated by the .spec.source field. The Backup will write data to the my-bucket Storage defined in the spec.sink field.
The .spec.streams field contains a list of Backup Streams. A Backup Stream contains the configuration of each topic that will be backed up. In this case, two topics named magic.events and pixie.dust will be backed up.

Automatically importing topics

It is possible to configure a backup so that it automatically adds topics present on a cluster if they match a condition:

apiVersion: kannika.io/v1alpha
kind: Backup
metadata:
  name: backup-example
spec:
  source: "my-kafka-cluster"
  sink: "my-storage"
  streams:
    - topic: "magic.events"
  topicSelectors:
    matchers:
    - name:
        literal: "spells.proper"
    - name:
        glob: "enchantments.*"
    - name:
        regex: "^curses\\."

In this example, we have changed the streams in our Backup definition:

The “magic.events” topic is still present, but we added topic “matchers” under the spec.topicSelectors property that will watch the cluster for new (or existing) topics matching any one of their rules:

the topic called “spells.proper” if it is present, or should it appear at some point;
any new (or existing) topic matching the glob pattern “enchantments.*”;
any new (or existing) topic matching the regular expression “^curses\.”.

Any dynamically added topic via the topicSelectors property will be backed up using the same configuration options (compression, rollover size, etc) as the ones explicitly defined in spec.streams.

Pausing and resuming

It is possible to pause a Backup, and resume it later. When a Backup is paused, the associated StatefulSet will be scaled down to 0.

Pausing and resuming a Backup

You can pause a Backup by setting the .spec.enabled field to false.

The following is an example of a Backup which is disabled (paused).

apiVersion: kannika.io/v1alpha
kind: Backup
metadata:
  name: paused-backup-example
spec:
  source: "my-kafka-cluster"
  sink: "my-bucket"
  enabled: false
  streams:
     - topic: "paused.events"

A Backup can be resumed again by setting the spec.enabled field to true again (or by removing the field).

Pausing and resuming a Backup Stream (Topic)

A Backup Stream can be paused by setting the enabled field to false of a Backup Stream.

The following is an example of a Backup Stream which is paused.

apiVersion: kannika.io/v1alpha
kind: Backup
metadata:
  name: paused-backup-stream-example
spec:
  source: "my-kafka-cluster"
  sink: "my-bucket"
  enabled: false
  streams:
    - topic: "disabled.events"
      enabled: false # Disables this stream

If a topic satisfies one of the topicSelectors matchers, then it is possible to pause the Backup for this particular topic by adding it to the spec.streams list with the enabled flag set to false, as shown in the following example:

apiVersion: kannika.io/v1alpha
kind: Backup
metadata:
  name: paused-backup-stream-example
spec:
  source: "my-kafka-cluster"
  sink: "my-bucket"
  enabled: true
  streams:
    - topic: "__consumer_offsets"
      enabled: false
  topicSelectors:
    matchers:
    - name:
        glob: "*"

Here, we the topicSelectors rules match all possible topic names, but we’ve excluded the __consumer_offsets topic by setting enabled.false.