Pod Reaper
On nodes that are close to full, the Kubernetes scheduler can occasionally bind a backup pod to a node
that cannot actually start it. The pod then stays in ContainerCreating indefinitely, and because the
node still hosts a pod it cannot be drained, which blocks node recycling and requires manual intervention.
The pod reaper watches the pods the operator manages and deletes the ones that get wedged in
ContainerCreating, so their StatefulSet or Deployment can reschedule them onto a healthy node.
Configuration
Section titled “Configuration”The pod reaper is configured through the operator’s Helm values:
operator: config: podReaper: enabled: true timeout: 5m interval: 1m| Property | Type | Default | Description |
|---|---|---|---|
operator.config.podReaper.enabled | boolean | true | Whether the pod reaper runs. |
operator.config.podReaper.timeout | duration | 5m | How long a pod may stay stuck in ContainerCreating before it is reaped. |
operator.config.podReaper.interval | duration | 1m | How often the operator scans for stuck pods. |
Durations are written as a number and a unit, for example 30s, 5m, or 1h.
What the pod reaper does
Section titled “What the pod reaper does”It only acts on managed backup pods
Section titled “It only acts on managed backup pods”The reaper considers backup pods and schema registry backup pods. The operator marks these pods with
the io.kannika/reap: "true" annotation, and the reaper only deletes pods that carry it. Restore pods
are run as Jobs with their own retry semantics and are left alone; pods that are not managed by Kannika
Armory are never touched.
To exclude a specific workload from reaping, set the io.kannika/reap annotation to "false" on the
Backup or SchemaRegistryBackup resource. The operator propagates that value onto its pods, so they
are no longer reaped. See Opt a backup out of reaping.
It only reaps pods stuck in ContainerCreating
Section titled “It only reaps pods stuck in ContainerCreating”A pod is reaped when all of the following hold:
- a container is waiting with reason
ContainerCreating; - the pod was created more than
timeoutago; - it is not already being deleted.
The age is taken from the pod’s creationTimestamp, so it covers the whole time since the pod was
created, including any time spent waiting to be scheduled. The operator re-checks on every interval,
so a pod is reaped within one interval of crossing the timeout.
Other waiting reasons, such as ImagePullBackOff or CreateContainerConfigError, are deliberately
left alone: they signal a permanent problem (a missing image, a missing secret) that deleting and
recreating the pod would not fix, so reaping would only create a restart loop that hides the real error.
It deletes the pod and records an event
Section titled “It deletes the pod and records an event”When a pod is reaped, the operator deletes it so its controller recreates it, and records a warning
event in the pod’s namespace. Because the reaped pod is gone, the event shows up in the namespace event
list rather than under kubectl describe pod:
$ kubectl get events -n <namespace> --field-selector reason=StuckPodReapedLAST SEEN TYPE REASON OBJECT MESSAGE30s Warning StuckPodReaped pod/<stuck-pod> Pod <stuck-pod> was stuck in ContainerCreating past the reaper timeout and was deleted so it can rescheduleA structured log line is emitted as well, with the pod name and namespace.
Examples
Section titled “Examples”Tune the timeout
Section titled “Tune the timeout”Reap pods that have been stuck for more than ten minutes, scanning every two minutes:
operator: config: podReaper: enabled: true timeout: 10m interval: 2mOpt a backup out of reaping
Section titled “Opt a backup out of reaping”Annotate a Backup (or SchemaRegistryBackup) with io.kannika/reap: "false" to keep the reaper from
deleting its pods, while leaving it enabled for everything else:
apiVersion: kannika.io/v1alphakind: Backupmetadata: name: my-backup annotations: io.kannika/reap: "false"spec: # ...Disable the pod reaper
Section titled “Disable the pod reaper”Turn the reaper off entirely if you prefer to handle stuck pods manually:
operator: config: podReaper: enabled: false
