How to Check Status of Jobs

Status of K8up Objects

The Kubernetes objects of K8up try to convey enough status information to understand common error conditions. All objects maintain the Status property and most of them provide additional information through Conditions. This allows to wait for certain steps in the process.

This will wait until the backup has started:

kubectl wait --for condition=progressing backup/demo-backup

This will wait until the backup was completed. The Reason of the condition tells you whether it was successful or not:

kubectl wait --for condition=completed backup/demo-backup`

Sample output when querying for backups

$ kubectl get backup
NAME                         SCHEDULE REF    COMPLETION   PREBACKUP              AGE
demo-backup                                  Succeeded    NoPreBackupPodsFound   3m20s
schedule-test-backup-b6bxz   schedule-test                NoPreBackupPodsFound   20s
schedule-test-backup-sj5rt   schedule-test                NoPreBackupPodsFound   80s
schedule-test-backup-whnkl   schedule-test   Failed       NoPreBackupPodsFound   2m20s

Status of Backups in particular

When the information that kubectl describe backup/<BACKUP NAME> was inconclusive you may want to look at the output of the backup Pod. Because every time a Backup starts, it creates a corresponding Pod (in the same namespace). You can list them when you are listing all pods by running kubectl get pods. When you identified the relevant pod make sure that it was successfully instantiated using kubectl describe pod/<POD NAME>. Then you can use kubectl logs pod/<POD NAME> command to troubleshoot a failed backup job.

Example

The following excerpt shows a complete and successful execution of a K8up Backup.

#$ kubectl describe backup/demo-backup
Name:         demo-backup
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  k8up.io/v1
Kind:         Backup
Metadata:
  # [clipped for brevity]
Spec:
  # [clipped for brevity]
Status:
  Conditions:
    Last Transition Time:  2020-13-32T25:71:61Z
    Message:               Deleted 1 resources
    Reason:                Succeeded
    Status:                True
    Type:                  Scrubbed
    Last Transition Time:  2020-13-32T25:61:61Z
    Message:               the job 'default/demo-backup' was created
    Reason:                Ready
    Status:                True
    Type:                  Ready
    Last Transition Time:  2020-13-32T25:71:61Z
    Message:               the job 'default/demo-backup' ended
    Reason:                Finished
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-13-32T25:71:61Z
    Message:               the job 'default/demo-backup' ended successfully
    Reason:                Succeeded
    Status:                True
    Type:                  Completed
    Last Transition Time:  2020-13-32T25:61:61Z
    Message:               no container definitions found
    Reason:                NoPreBackupPodsFound
    Status:                True
    Type:                  PreBackupPodsReady
  Started:                 true
Events:                    <none>

#$ kubectl get jobs
NAME          COMPLETIONS   DURATION   AGE
demo-backup   1/1           3m      6m

#$ kubectl describe jobs
Name:           demo-backup
Namespace:      default
Selector:       controller-uid=1ba56104-d1de-45c7-9d29-abf4da6943a5
Labels:         k8upjob=true
Annotations:    <none>
Controlled By:  Backup/demo-backup
Parallelism:    1
Completions:    1
Start Time:     Fri, 32 Jeb 2020 25:61:61 +0100
Completed At:   Fri, 32 Jeb 2020 25:61:61 +0100
Duration:       3m
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:           controller-uid=1ba56104-d1de-45c7-9d29-abf4da6943a5
                    job-name=demo-backup
  Service Account:  pod-executor
  Containers:
   demo-backup:
    Image:      vshn/wrestic:v0.2.0
    Port:       <none>
    Host Port:  <none>
    Environment:
      # [clipped for brevity]
    Mounts:                      <none>
  Volumes:                       <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  6m     job-controller  Created pod: demo-backup-t2mtp
  Normal  Completed         3m     job-controller  Job completed

#$ kubectl get pods
NAME                READY   STATUS      RESTARTS   AGE
demo-backup-t2mtp   0/1     Completed   0          3m

#$ kubectl describe pod/demo-backup-t2mtp
Name:           demo-backup-t2mtp
Namespace:      default
Priority:       0
Node:           <none>
Labels:         controller-uid=1ba56104-d1de-45c7-9d29-abf4da6943a5
                job-name=demo-backup
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  Job/demo-backup
Containers:
  demo-backup:
    Image:      vshn/wrestic:v0.2.0
    Port:       <none>
    Host Port:  <none>
    Environment:
      # [clipped for brevity]
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pod-executor-token-x5kkk (ro)
Volumes:
  pod-executor-token-x5kkk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pod-executor-token-x5kkk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

$ kubectl logs pod/demo-backup-t2mtp
I1332 25:61:61.000001       1 main.go:42] wrestic "level"=0 "msg"="Wrestic Version: unreleased"
I1332 25:61:61.000002       1 main.go:43] wrestic "level"=0 "msg"="Operator Build Date: now"
I1332 25:61:61.000003       1 main.go:44] wrestic "level"=0 "msg"="Go Version: go1.14.3"
I1332 25:61:61.000004       1 main.go:45] wrestic "level"=0 "msg"="Go OS/Arch: linux/amd64"
I1332 25:61:61.000005       1 main.go:191] wrestic "level"=0 "msg"="setting up a signal handler"
I1332 25:61:61.000006       1 snapshots.go:37] wrestic/snapshots "level"=0 "msg"="getting list of snapshots"
I1332 25:61:61.000007       1 wait.go:31] wrestic/WaitForLocks "level"=0 "msg"="remove old locks"
I1332 25:61:61.000008       1 unlock.go:7] wrestic/unlock "level"=0 "msg"="unlocking repository"  "all"=false
I1332 25:61:61.000009       1 utils.go:51] wrestic/unlock/restic "level"=0 "msg"="successfully removed locks"
I1332 25:61:61.000010       1 wait.go:37] wrestic/WaitForLocks "level"=0 "msg"="checking for any exclusive locks"
I1332 25:61:61.000011       1 wait.go:43] wrestic/WaitForLocks "level"=0 "msg"="getting a list of active locks"
I1332 25:61:61.000012       1 wait.go:62] wrestic/WaitForLocks "level"=0 "msg"="no more exclusive locks found"
I1332 25:61:61.000013       1 pod_list.go:50] wrestic/k8sClient "level"=0 "msg"="listing all pods"  "annotation"="k8up.io/backupcommand" "namespace"="default"
I1332 25:61:61.000014       1 main.go:174] wrestic "level"=0 "msg"="all pod commands have finished successfully"
I1332 25:61:61.000015       1 backup.go:64] wrestic/backup "level"=0 "msg"="starting backup"
I1332 25:61:61.000016       1 backup.go:67] wrestic/backup "level"=0 "msg"="backupdir does not exist, skipping"  "dirname"="/data"

Metrics

The operator exposes a :8080/metrics endpoint for Prometheus scraping. This will give you additional metrics that can be used to find failed jobs. See the Prometheus examples in our GitHub repository.