Skip to content

    Restoring data from a Backup

    If you are on this page, you probably have a problem with your data. Or you are just curious. In any case, you are in the right place 😊! This guide will help you restore data from a Backup.

    Restoring data

    The actual restore process itself is inherently straightforward. However, managing the environment can be complex, as there are many things to consider. To successfully restore your data, follow these steps to guide you through the process.

    Before restoring data:

    1. Pause the Backup (optional)
    2. Disable consumers and producers
    3. Prepare the target environment

    Restoring data:

    1. Configure the EventHub
    2. Configure the Credentials
    3. Configure the Restore
    4. Start the Restore

    After restoring data:

    1. Reset the consumer offsets
    2. Re-enable consumers and producers
    3. Reconfigure the Backup with the restored topics

    Pause the Backup (optional)

    First, you must pause the Backup when using Volume Storages. You can disable it from the console, or this can be done by setting the enabled flag to false in the Backup definition

    apiVersion: kannika.io/v1alpha
    kind: Backup
    metadata:
    name: your-backup
    spec:
    source: "your-kafka-cluster"
    sink: "your-storage"
    enabled: false
    # ...

    You can use kubectl patch to update the Backup definition:

    $ kubectl patch backup your-backup \
    --type merge \
    -p '{"spec":{"enabled":false}}'

    It is essential to pause the Backup before continuing. This is because the Backup with a Volume Storages uses a Persistent Volume to store the data, and this Persistent Volume needs to become available for the Restore. If the Persistent Volume is not detached, the Restore will not be able to mount it, and the Restore will not start until the Persistent Volume is detached.

    Disable consumers and producers

    To avoid runtime errors for your applications, and to avoid data corruption, it is important to disable all consumers and producers of the topics that you want to restore. This may result in downtime for your applications, so it is important to plan this step carefully.

    Prepare the target environment

    The next step is to prepare the target environment and the topics that you want to restore.

    For the Restore to work, topics need to adhere to the following rules:

    • The topic must exist
    • The topic must have the same amount of partitions as before
    • The topic must be empty so no old data is mixed with the restored data

    To do this, you have two options. You can either:

    • Delete the topics and recreate them with the same amount of partitions
    • Delete all messages from the topics by setting the retention time to 0, wait for the messages to be deleted, and then set the retention time back to its original value.

    Configure the EventHub

    The next step is to configure the EventHub where you want to restore the data to. This step might be optional, as you may reuse the same EventHub that you used for the Backup.

    Here is an example of a Kafka EventHub definition in Kubernetes:

    apiVersion: kannika.io/v1alpha
    kind: EventHub
    metadata:
    name: your-kafka-cluster
    spec:
    kafka:
    properties:
    bootstrap.servers: "your-kafka-cluster:9092"

    Configure the Credentials

    The Backup’s Credentials may not be suitable for the Restore, as it may only have read access to the EventHub. For security reasons, it is advised to use different Credentials for the Restore than for the Backup, which must have write access to the EventHub.

    Here is an example of SASL/PLAIN Credentials definition in Kubernetes:

    apiVersion: kannika.io/v1alpha
    kind: Credentials
    metadata:
    name: your-kafka-cluster-credentials
    spec:
    sasl:
    mechanism: PLAIN # Or SCRAM-SHA-256, SCRAM-SHA-512, etc.
    usernameFrom:
    secretKeyRef:
    name: your-kafka-cluster-creds # Reference to the Secret below
    key: username # Refers to the username in the Secret
    passwordFrom:
    secretKeyRef:
    name: your-kafka-cluster-creds # Reference to the Secret below
    key: password # Refers to the password in the Secret
    ---
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
    name: your-kafka-cluster-creds
    data:
    username: <base64-encoded-username>
    password: <base64-encoded-password>

    Configure the Restore

    Now that you have prepared the environment where you want to restore the data to, and have the EventHub and Credentials configured to access this environment, you can start configuring the Restore.

    Here is an example of a Restore definition in Kubernetes:

    apiVersion: kannika.io/v1alpha
    kind: Restore
    metadata:
    name: your-restore
    spec:
    enabled: false # Set to true to start the Restore (requires 1 topic mapping at least)
    source: "your-storage" # The Storage where the data will be restored from
    sourceCredentialsFrom:
    credentialsRef:
    name: your-storage-credentials # The Credentials to access the Storage
    sink: "your-kafka-cluster" # The EventHub where the data will be restored to
    sinkCredentialsFrom:
    credentialsRef:
    name: your-kafka-cluster-credentials # The Credentials to access the EventHub
    config:
    legacyOffsetHeader: "__original_offset" # Store the original consumer offsets in the message headers
    mapping:
    your-topic:
    target: "your-topic.v2" # Restore the topic `your-topic` to `your-topic.v2`

    In this example:

    • A Restore named your-restore is created

    • The .spec.source is set to the Storage named your-storage. This is where the data will be restored from.

    • The .spec.sourceCredentialsFrom is set to the Credentials named your-storage-credentials. These are the Credentials that will be used to access the Storage.

    • The .spec.sink is set to the EventHub named your-kafka-cluster. This EventHub refers to the cluster where you want to restore the data to.

    • The .spec.sinkCredentialsFrom is set to the Credentials named your-kafka-cluster-credentials. These are the Credentials that will be used to access the EventHub.

    • The .spec.config.mapping defines the mapping between the topics in the Backup and the topics in the cluster. In this example, the topic your-topic will be restored to the topic your-topic.v2.

    • The .spec.config.legacyOffsetHeader is set to __original_offset. This means that the original consumer offsets will be stored in the message header with the key __original_offset. This is useful if you want to reset the consumer offsets after the restore. See the Reset the consumer offsets section for more information.

    Have a look at the Restore section for more restore configuration options.

    After creating the Restore definition, you can check the status of the Restore by running:

    $ kubectl get restore your-restore
    NAME STATUS
    your-restores ✏️ Draft

    Normally, the Restore will be in a Draft state. This means that the Restore is not running yet. The Draft state is useful to prepare the Restore before starting it.

    Start the Restore

    To start the Restore, you must set the enabled flag to true in the Restore definition.

    apiVersion: kannika.io/v1alpha
    kind: Restore
    metadata:
    name: your-restore
    spec:
    enabled: true
    source: "your-storage"
    sink: "your-kafka-cluster"
    config: {}
    # Other configuration ...

    You can use kubectl patch to update the Restore definition:

    $ kubectl patch restore your-restore --type merge -p '{"spec":{"enabled":true}}'

    You can check the status of the Restore to see if it is running:

    $ kubectl get restore your-restore
    NAME STATUS
    your-restore 🚀 Restoring

    The Restore will run until it has restored all the data from the Backup. You can check the status of the Restore to see if it is finished:

    $ kubectl get restore your-restore
    NAME STATUS
    your-restore 🚀 Done

    In case the Restore fails, you may restart it by deleting the associated Job. Please check the dedicated Resuming a Restore section for more information.

    $ kubectl delete job your-restore-job

    Reset the consumer offsets

    After the Restore has finished, you might have to reset the consumer offsets of the restored topics, depending on your use case.

    Due to the nature of Kafka, it is not possible to restore messages with the same offsets as before. Therefore, the consumer offsets of the restored topics will be different from the original consumer offsets.

    Kannika Armory stores the original consumer offsets, and provides the option to store them in the message headers, so you can reset the consumer offsets after the restore.

    See the Adding the legacy offset to the headers section for more information.

    Re-enable consumers and producers

    After all data has been restored, and consumer offsets have been reset, you may re-enable the consumers and producers.

    However, in case of data corruption, you might also need to take corrective actions to fix the data in the consumer applications.

    Reconfigure the Backup with the restored topics

    After the Restore has finished, you should reconfigure the Backup to include the restored topics.

    If you do not do this, the Backup will not include the restored topics, and the old data will still be in the Backup.