Restoring data from a Backup
If you are on this page, you probably have a problem with your data. Or you are just curious. In any case, you are in the right place 😊! This guide will help you restore data from a Backup.
Restoring data
The actual restore process itself is inherently straightforward. However, managing the environment can be complex, as there are many things to consider. To successfully restore your data, follow these steps to guide you through the process.
Before restoring data:
Restoring data:
After restoring data:
- Reset the consumer offsets
- Re-enable consumers and producers
- Reconfigure the Backup with the restored topics
Pause the Backup (optional)
First,
you must pause the Backup when using Volume Storages.
You can disable it from the console,
or this can be done by setting the enabled
flag to false
in the Backup definition
You can use kubectl patch
to update the Backup definition:
It is essential to pause the Backup before continuing. This is because the Backup with a Volume Storages uses a Persistent Volume to store the data, and this Persistent Volume needs to become available for the Restore. If the Persistent Volume is not detached, the Restore will not be able to mount it, and the Restore will not start until the Persistent Volume is detached.
Disable consumers and producers
To avoid runtime errors for your applications, and to avoid data corruption, it is important to disable all consumers and producers of the topics that you want to restore. This may result in downtime for your applications, so it is important to plan this step carefully.
Prepare the target environment
The next step is to prepare the target environment and the topics that you want to restore.
For the Restore to work, topics need to adhere to the following rules:
- The topic must exist
- The topic must have the same amount of partitions as before
- The topic must be empty so no old data is mixed with the restored data
To do this, you have two options. You can either:
- Delete the topics and recreate them with the same amount of partitions
- Delete all messages from the topics by setting the retention time to
0
, wait for the messages to be deleted, and then set the retention time back to its original value.
Configure the EventHub
The next step is to configure the EventHub where you want to restore the data to. This step might be optional, as you may reuse the same EventHub that you used for the Backup.
Here is an example of a Kafka EventHub definition in Kubernetes:
Configure the Credentials
The Backup’s Credentials may not be suitable for the Restore, as it may only have read access to the EventHub. For security reasons, it is advised to use different Credentials for the Restore than for the Backup, which must have write access to the EventHub.
Here is an example of SASL/PLAIN Credentials definition in Kubernetes:
Configure the Restore
Now that you have prepared the environment where you want to restore the data to, and have the EventHub and Credentials configured to access this environment, you can start configuring the Restore.
Here is an example of a Restore definition in Kubernetes:
In this example:
-
A Restore named
your-restore
is created -
The
.spec.source
is set to the Storage namedyour-storage
. This is where the data will be restored from. -
The
.spec.sourceCredentialsFrom
is set to the Credentials namedyour-storage-credentials
. These are the Credentials that will be used to access the Storage. -
The
.spec.sink
is set to the EventHub namedyour-kafka-cluster
. This EventHub refers to the cluster where you want to restore the data to. -
The
.spec.sinkCredentialsFrom
is set to the Credentials namedyour-kafka-cluster-credentials
. These are the Credentials that will be used to access the EventHub. -
The
.spec.config.topics
defines the mapping between the topics in the Backup and the topics in the cluster. In this example, the topicyour-topic
will be restored to the topicyour-topic.v2
. -
The
.spec.config.legacyOffsetHeader
is set to__original_offset
. This means that the original consumer offsets will be stored in the message header with the key__original_offset
. This is useful if you want to reset the consumer offsets after the restore. See the Reset the consumer offsets section for more information.
Have a look at the Restore section for more restore configuration options.
After creating the Restore definition, you can check the status of the Restore by running:
Normally, the Restore will be in a Draft state. This means that the Restore is not running yet. The Draft state is useful to prepare the Restore before starting it.
Start the Restore
To start the Restore,
you must set the enabled
flag to true
in the Restore definition.
You can use kubectl patch
to update the Restore definition:
You can check the status of the Restore to see if it is running:
The Restore will run until it has restored all the data from the Backup. You can check the status of the Restore to see if it is finished:
In case the Restore fails, you may restart it by deleting the associated Job. Please check the dedicated Resuming a Restore section for more information.
Reset the consumer offsets
After the Restore has finished, you might have to reset the consumer offsets of the restored topics, depending on your use case.
Due to the nature of Kafka, it is not possible to restore messages with the same offsets as before. Therefore, the consumer offsets of the restored topics will be different from the original consumer offsets.
Kannika Armory stores the original consumer offsets, and provides the option to store them in the message headers, so you can reset the consumer offsets after the restore.
See the Adding the legacy offset to the headers section for more information.
Re-enable consumers and producers
After all data has been restored, and consumer offsets have been reset, you may re-enable the consumers and producers.
However, in case of data corruption, you might also need to take corrective actions to fix the data in the consumer applications.
Reconfigure the Backup with the restored topics
After the Restore has finished, you should reconfigure the Backup to include the restored topics.
If you do not do this, the Backup will not include the restored topics, and the old data will still be in the Backup.