Skip to content

Restore Lifecycle

This page describes the lifecycle of a Restore in Kubernetes, managed by the operator. When you create a Restore, it begins in a Draft, Ready or Invalid state depending on its configuration. Once enabled, the operator provisions the necessary Kubernetes resources and the restore enters the Restoring phase. From there, the restore can complete successfully (Done), encounter errors (Failed), or be interrupted externally (Interrupted or Paused).

The message field on a Restore resource provides a human-readable summary of its current state. This is the primary field to check when monitoring a restore operation.

StatusDescriptionNext Steps
DraftThe Restore has been created but no topics have been defined yet.Add topic mappings to .spec.config.topics
ReadyThe Restore is fully configured but has not been started.Set .spec.enabled to true to start
InvalidThe configuration contains errors that prevent the restore from running.Check the ConfigurationValid condition for details
InitializingThe operator is creating the Kubernetes Job and associated resources.Wait for the job to start
RestoringThe restore is actively running and writing data to the target.Monitor progress via the Restore Report
PausedThe restore was stopped because .spec.enabled was set to false.Set .spec.enabled to true to resume
InterruptedThe restore was stopped externally (e.g., pod eviction, node failure).Delete the Job to trigger a resume
FailedThe restore encountered an error and cannot continue.Check pod logs and conditions for error details
DoneThe restore completed successfully.No action required

While the status.message provides a quick summary, the status.conditions array offers detailed information about multiple component of the restore process. It is possible to have multiple conditions with different statuses at the same time. The conditions are independent and reflect different aspects of the restore process.

To illustrate, here is an example status section for a Restore that is currently in the Restoring state:

kind: Restore
# ...
status:
message: Restoring
completionTime: null
startTime: "2024-01-15T10:30:00Z"
conditions:
- type: ConfigurationValid
status: "True"
reason: ValidConfiguration
message: The restore configuration is valid
lastTransitionTime: "2024-01-15T10:30:00Z"
observedGeneration: 3
- type: RestoreCompleted
status: "False"
reason: Restoring
message: The restore is in progress
lastTransitionTime: "2024-01-15T10:30:00Z"
observedGeneration: 3
- type: JobCompleted
status: "False"
reason: JobRunning
message: The job is currently running
lastTransitionTime: "2024-01-15T10:30:00Z"
observedGeneration: 3
- type: PodCompleted
status: "False"
reason: PodRunning
message: The pod is currently running
lastTransitionTime: "2024-01-15T10:30:00Z"
observedGeneration: 3

When a Restore is started, a Kubernetes Job is created to perform the restore operation. This Job’s status is tracked via the JobCompleted condition. It begins as JobInitializing when the Job is created, transitions to JobRunning if the Job starts successfully, and finally to either JobSucceeded or JobFailed when the Job completes or fails respectively.

Every time the Restore is restarted, a new Job is created and the JobCompleted condition is reset.

ReasonDescription
JobInitializingThe job is being created and started
JobRunningThe job is actively running
JobSucceededThe job completed successfully
JobFailedThe job failed due to an error

The Job creates a Pod to perform the actual restore work. Normally it would be sufficient to track the Job status, but in some cases the Pod may be terminated externally (e.g., due to node failure or eviction). To capture this, the operator also watches the Pod and captures its exit status to allow better diagnostics. The pod status is reported via the PodCompleted condition. Similar to the JobCompleted condition, it begins as PodPending when the Pod is created, transitions to PodRunning if the Pod starts successfully, and finally to either PodSucceeded, PodFailed, or PodInterrupted.

ReasonDescription
PodPendingThe pod is being created and started
PodRunningThe pod is actively running
PodSucceededThe pod completed successfully
PodFailedThe restore pod exited with an error
PodInterruptedThe pod was interrupted externally (e.g., SIGTERM/SIGKILL)
UnknownUnknown status. This should not happen normally but may occur because of transient errors

It is critical that the restore configuration is valid before starting the restore process. The ConfigurationValid condition reports on the state of the restore configuration. The condition is True when the configuration is valid and False when invalid. The message field provides additional context.

status:
conditions:
- type: ConfigurationValid
status: "False"
reason: InvalidConfiguration
message: "Invalid partition number -1"
lastTransitionTime: "2026-02-01T10:00:00Z"
observedGeneration: 1

Based on the PodCompleted, JobCompleted and ConfigurationValid conditions, and some other information, a final RestoreCompleted condition.

Possible reasons set on this condition are Draft, Ready, Invalid, Initializing, Restoring, Paused, Interrupted, Failed and Done, same as the overall Restore status described above.

Example of a completed Restore
kind: Restore
# ...
status:
message: Done
completionTime: "2024-01-15T11:00:00Z"
startTime: "2024-01-15T10:30:00Z"
conditions:
- type: ConfigurationValid
status: "True"
reason: ValidConfiguration
message: The restore configuration is valid
lastTransitionTime: "2024-01-15T10:30:00Z"
observedGeneration: 3
- type: RestoreCompleted
status: "True"
reason: Done
message: The restore completed successfully
lastTransitionTime: "2024-01-15T11:00:00Z"
observedGeneration: 3
- type: JobCompleted
status: "True"
reason: JobSucceeded
message: The job completed successfully
lastTransitionTime: "2024-01-15T11:00:00Z"
observedGeneration: 3
- type: PodCompleted
status: "True"
reason: PodSucceeded
message: The pod completed successfully
lastTransitionTime: "2024-01-15T11:00:00Z"
observedGeneration: 3
Example of an interrupted Restore
kind: Restore
#...
spec:
enabled: false
# ...
status:
message: "Interrupted"
conditions:
- type: RestoreCompleted
status: "False"
reason: "Interrupted"
lastTransitionTime: "2026-02-01T10:00:00Z"
observedGeneration: 3
- type: JobCompleted
status: "False"
reason: "JobFailed"
lastTransitionTime: "2026-02-01T10:00:00Z"
observedGeneration: 3
- type: PodCompleted
status: "False"
reason: "PodInterrupted"
lastTransitionTime: "2026-02-01T10:00:00Z"
observedGeneration: 3
- type: ConfigurationValid
status: "False"
reason: InvalidConfiguration
message: "Invalid partition number -1"
lastTransitionTime: "2026-02-01T10:00:00Z"
observedGeneration: 3