Restore Lifecycle

This page describes the lifecycle of a Restore in Kubernetes, managed by the operator. When you create a Restore, it begins in a Draft, Ready or Invalid state depending on its configuration. Once enabled, the operator provisions the necessary Kubernetes resources and the restore enters the Restoring phase. From there, the restore can complete successfully (Done), encounter errors (Failed), or be interrupted externally (Interrupted or Paused).

Restore Status

The message field on a Restore resource provides a human-readable summary of its current state. This is the primary field to check when monitoring a restore operation.

Status	Description	Next Steps
Draft	The Restore has been created but no topics have been defined yet.	Add topic mappings to `.spec.config.topics`
Ready	The Restore is fully configured but has not been started.	Set `.spec.enabled` to `true` to start
Invalid	The configuration contains errors that prevent the restore from running.	Check the `ConfigurationValid` condition for details
Initializing	The operator is creating the Kubernetes Job and associated resources.	Wait for the job to start
Restoring	The restore is actively running and writing data to the target.	Monitor progress via the Restore Report
Paused	The restore was stopped because `.spec.enabled` was set to `false`.	Set `.spec.enabled` to `true` to resume
Interrupted	The restore was stopped externally (e.g., pod eviction, node failure).	Delete the Job to trigger a resume
Failed	The restore encountered an error and cannot continue.	Check pod logs and conditions for error details
Done	The restore completed successfully.	No action required

Conditions

While the status.message provides a quick summary, the status.conditions array offers detailed information about multiple component of the restore process. It is possible to have multiple conditions with different statuses at the same time. The conditions are independent and reflect different aspects of the restore process.

To illustrate, here is an example status section for a Restore that is currently in the Restoring state:

kind: Restore
# ...
status:
  message: Restoring
  completionTime: null
  startTime: "2024-01-15T10:30:00Z"
  conditions:
  - type: ConfigurationValid
    status: "True"
    reason: ValidConfiguration
    message: The restore configuration is valid
    lastTransitionTime: "2024-01-15T10:30:00Z"
    observedGeneration: 3
  - type: RestoreCompleted
    status: "False"
    reason: Restoring
    message: The restore is in progress
    lastTransitionTime: "2024-01-15T10:30:00Z"
    observedGeneration: 3
  - type: JobCompleted
    status: "False"
    reason: JobRunning
    message: The job is currently running
    lastTransitionTime: "2024-01-15T10:30:00Z"
    observedGeneration: 3
  - type: PodCompleted
    status: "False"
    reason: PodRunning
    message: The pod is currently running
    lastTransitionTime: "2024-01-15T10:30:00Z"
    observedGeneration: 3

Job completion

When a Restore is started, a Kubernetes Job is created to perform the restore operation. This Job’s status is tracked via the JobCompleted condition. It begins as JobInitializing when the Job is created, transitions to JobRunning if the Job starts successfully, and finally to either JobSucceeded or JobFailed when the Job completes or fails respectively.

Every time the Restore is restarted, a new Job is created and the JobCompleted condition is reset.

Reason	Description
JobInitializing	The job is being created and started
JobRunning	The job is actively running
JobSucceeded	The job completed successfully
JobFailed	The job failed due to an error

Pod completion

The Job creates a Pod to perform the actual restore work. Normally it would be sufficient to track the Job status, but in some cases the Pod may be terminated externally (e.g., due to node failure or eviction). To capture this, the operator also watches the Pod and captures its exit status to allow better diagnostics. The pod status is reported via the PodCompleted condition. Similar to the JobCompleted condition, it begins as PodPending when the Pod is created, transitions to PodRunning if the Pod starts successfully, and finally to either PodSucceeded, PodFailed, or PodInterrupted.

Reason	Description
PodPending	The pod is being created and started
PodRunning	The pod is actively running
PodSucceeded	The pod completed successfully
PodFailed	The restore pod exited with an error
PodInterrupted	The pod was interrupted externally (e.g., SIGTERM/SIGKILL)
Unknown	Unknown status. This should not happen normally but may occur because of transient errors

Configuration validation

It is critical that the restore configuration is valid before starting the restore process. The ConfigurationValid condition reports on the state of the restore configuration. The condition is True when the configuration is valid and False when invalid. The message field provides additional context.

status:
  conditions:
    - type: ConfigurationValid
      status: "False"
      reason: InvalidConfiguration
      message: "Invalid partition number -1"
      lastTransitionTime: "2026-02-01T10:00:00Z"
      observedGeneration: 1

Restore completion

Based on the PodCompleted, JobCompleted and ConfigurationValid conditions, and some other information, a final RestoreCompleted condition.

Possible reasons set on this condition are Draft, Ready, Invalid, Initializing, Restoring, Paused, Interrupted, Failed and Done, same as the overall Restore status described above.

Example of a completed Restore

kind: Restore
# ...
status:
  message: Done
  completionTime: "2024-01-15T11:00:00Z"
  startTime: "2024-01-15T10:30:00Z"
  conditions:
  - type: ConfigurationValid
    status: "True"
    reason: ValidConfiguration
    message: The restore configuration is valid
    lastTransitionTime: "2024-01-15T10:30:00Z"
    observedGeneration: 3
  - type: RestoreCompleted
    status: "True"
    reason: Done
    message: The restore completed successfully
    lastTransitionTime: "2024-01-15T11:00:00Z"
    observedGeneration: 3
  - type: JobCompleted
    status: "True"
    reason: JobSucceeded
    message: The job completed successfully
    lastTransitionTime: "2024-01-15T11:00:00Z"
    observedGeneration: 3
  - type: PodCompleted
    status: "True"
    reason: PodSucceeded
    message: The pod completed successfully
    lastTransitionTime: "2024-01-15T11:00:00Z"
    observedGeneration: 3

Example of an interrupted Restore

 kind: Restore
 #...
 spec:
   enabled: false
 # ...
 status:
   message: "Interrupted"
   conditions:
   - type: RestoreCompleted
     status: "False"
     reason: "Interrupted"
     lastTransitionTime: "2026-02-01T10:00:00Z"
     observedGeneration: 3
   - type: JobCompleted
     status: "False"
     reason: "JobFailed"
     lastTransitionTime: "2026-02-01T10:00:00Z"
     observedGeneration: 3
   - type: PodCompleted
     status: "False"
     reason: "PodInterrupted"
     lastTransitionTime: "2026-02-01T10:00:00Z"
     observedGeneration: 3
   - type: ConfigurationValid
     status: "False"
     reason: InvalidConfiguration
     message: "Invalid partition number -1"
     lastTransitionTime: "2026-02-01T10:00:00Z"
     observedGeneration: 3