VeleroBackupFailures #
Meaning #
The ratio of completely failed backups to total backup attempts for a Velero schedule has exceeded 25% over the last 15 minutes. Unlike partial failures, these backups produced no usable data at all.
Impact #
No new backup data is being created for the affected schedule. The most recent successful backup is aging, and once its TTL expires there will be no restorable backup point for the protected resources.
Diagnosis #
Identify the schedule and check its recent backup history:
velero --namespace velero-system backup get --schedule <schedule>
Describe a failed backup for detailed error information:
velero --namespace velero-system backup describe <backup-name> --details
Check the Velero server logs:
kubectl logs -n velero-system deployment/velero
Verify the S3 backup storage location status:
velero --namespace velero-system backup-location get default -o yaml
Mitigation #
Delete the failed backup and re-trigger:
velero --namespace velero-system backup delete <failed-backup-name>
velero --namespace velero-system backup create --from-schedule <schedule> --ttl 72h
If the backup location is misconfigured, check the cluster-specific override values for the S3 endpoint, bucket name, and credentials. Ensure the kubernetes-backups bucket exists in the S3-compatible backend and that the access key and secret key are correct.