Appendix

Running the Velero CLI

Velero provides a Command-Line Interface (CLI) for taking backups and performing restores. The CLI can be installed locally, or it can be invoked by kubectl exec on the Velero Backup Controller pod.

Local Installation

Refer to the Velero Documentation for installing Velero on your platform.

Velero uses your KUBECONFIG file to connect to the cluster.

$ velero --help
Velero is a tool for managing disaster recovery, specifically for Kubernetes
cluster resources. It provides a simple, configurable, and operationally robust
way to back up your application state and associated data.

If you're familiar with kubectl, Velero supports a similar model, allowing you to
execute commands such as 'velero get backup' and 'velero create schedule'. The same
operations can also be performed as 'velero backup get' and 'velero schedule create'.

Usage:
  velero [command]

Available Commands:
  backup            Work with backups
  backup-location   Work with backup storage locations
  bug               Report a Velero bug
  client            Velero client related commands
  completion        Generate completion script
  create            Create velero resources
  debug             Generate debug bundle
  delete            Delete velero resources
  describe          Describe velero resources
  get               Get velero resources
  help              Help about any command
  install           Install Velero
  plugin            Work with plugins
  restic            Work with restic
  restore           Work with restores
  schedule          Work with schedules
  snapshot-location Work with snapshot locations
  uninstall         Uninstall Velero
  version           Print the velero version and associated image

Flags:
      --add_dir_header                   If true, adds the file directory to the header
      --alsologtostderr                  log to standard error as well as files
      --colorized optionalBool           Show colored output in TTY. Overrides 'colorized' value from $HOME/.config/velero/config.json if present. Enabled by default
      --features stringArray             Comma-separated list of features to enable for this Velero process. Combines with values from $HOME/.config/velero/config.json if present
  -h, --help                             help for velero
      --kubeconfig string                Path to the kubeconfig file to use to talk to the Kubernetes apiserver. If unset, try the environment variable KUBECONFIG, as well as in-cluster configuration
      --kubecontext string               The context to use to talk to the Kubernetes apiserver. If unset defaults to whatever your current-context is (kubectl config current-context)
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory
      --log_file string                  If non-empty, use this log file
      --log_file_max_size uint           Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                      log to standard error instead of files (default true)
  -n, --namespace string                 The namespace in which Velero should operate (default "velero")
      --skip_headers                     If true, avoid header prefixes in the log messages
      --skip_log_headers                 If true, avoid headers when opening log files
      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
  -v, --v Level                          number for the log level verbosity
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging

Use "velero [command] --help" for more information about a command.

Executing on the Velero Backup Controller Pod

If it is not possible to install the Velero CLI on the local workstation, you can still run Velero commands directly on the Velero pod as follows:

$ velero_namespace="Put your Velero namespace here"
$ kubectl exec velero-699dc869d4-r24bh -n $velero_namespace -c velero -- /velero help
Velero is a tool for managing disaster recovery, specifically for Kubernetes
cluster resources. It provides a simple, configurable, and operationally robust
way to back up your application state and associated data.

<<<output-truncated-for-brevity>>>

Backups using Velero

Creating a Backup

To take a backup of Arthur, you would invoke the CLI as follows.

$ arthur_namespace="Put your Arthur namespace here"
$ velero_namespace="Put your Velero namespace here"
$ velero backup create <some-unique-name> \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --storage-location=docs-demo-backup-location-velero

Listing all Backups

You can list all backups using the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero backup get -n $velero_namespace

Describing a Backup

You can get an overview of the backup using the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero backup describe <insert-backup-name> -n $velero_namespace

Debugging a Backup

For debugging a backup, you can access the backup's logs using the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero backup logs <insert-backup-name> -n $velero_namespace | head

Restores using Velero

Similar to Backup, Restore happens using the Velero CLI. A restore takes a Backup object and then executes the restore procedure.

Attempting a Restore

You can execute a restore with the following Velero CLI command:

$ velero_namespace="Put your Velero namespace here"
$ velero restore create \
    --from-backup <insert-backup-name> \
    --namespace $velero_namespace \
    --restore-volumes=true

Listing all Restore attempts

Just like with the Backup, Velero will create a Restore Velero Resource, which you can inspect with the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero restore get -n $velero_namespace

Describing a Restore attempt

You can get an overview of the restore attempt using the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero restore describe <insert-restore-name> -n $velero_namespace

Debugging a Restore attempt

For debugging a restore attempt, you can access the logs using the Velero CLI:

$ velero_namespace="Put your Velero namespace here"
$ velero restore logs <insert-restore-name> -n $velero_namespace | head

Running Backups on a Schedule

There are two ways the Arthur platform can be backed up on a schedule:

Scripting the entire backup process {ref}(see example) <scripted_solution>, and executing it on a fixed schedule from a job runner (Jenkins, Gitlab-CI etc.)

Leveraging native schedulers to back up the individual components of the platform:

Clickhouse is backed up at midnight (by default) using Kubernetes CronJobs out-of-the-box.

Use Velero Schedules to create Velero Backups:

Messaging infrastructure

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
$ storage_location="Put your storage location here"

$ arthur_namespace="Put your Arthur namespace here"
$ velero_namespace="Put your Velero namespace here"

$ velero schedule create messaging-infra-backup-nightly \
  --namespace=$velero_namespace \
  --include-namespaces=$arthur_namespace \
  --selector='app in (cp-zookeeper,cp-kafka)' \
  --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets
  --storage-location=$storage_location \
  --schedule "0 0 * * *" \
  --ttl 720h0m0s

Enrichments (infrastructure and workflows)

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
$ storage_location="Put your storage location here"

$ arthur_namespace="Put your Arthur namespace here"
$ velero_namespace="Put your Velero namespace here"

$ velero schedule create enrichments-workflows-backup-nightly \
  --namespace=$velero_namespace \
  --include-namespaces=$arthur_namespace \
  --include-resources=workflows \
  --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
  --storage-location=$storage_location \
  --schedule "0 0 * * *" \
  --ttl 720h0m0s

$ velero schedule create qa-enrichments-infra-backup-nightly \
  --namespace=$velero_namespace \
  --include-namespaces=$arthur_namespace \
  --selector='component in (kafka-mover-init-connector, model_server)' \
  --include-resources=deployments,services \
  --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
  --storage-location=$storage_location \
  --schedule "0 0 * * *" \
  --ttl 720h0m0s

RDS databases can be automatically backed up on a schedule, not at a specific point in time but within a 30-minute window. And during this window, the database is snapshotted at a random time. Due to this limitation from AWS, ensure there are no operations (like model CRUD, etc.) on the Arthur platform during the backup window.
shell $ aws rds modify-db-instance \ --db-instance-identifier RDS_DB_NAME \ --backup-retention-period 14 \ --preferred-backup-window 23:45-00:15 \ --profile AWS_PROFILE_NAME \ --region AWS_REGION \ --apply-immediately

Sample Backup Script (manual)

The following script can be used to run all the backup steps together:

#!/bin/bash

set -euo pipefail
IFS=$'\n\t'

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

backup_date=$(date +%Y-%m-%d-%H-%M-%S);
name=arthur-backup-$backup_date

echo "Creating a new backup with name $name"

echo "Taking a backup of CH data"
kubectl create job $name-clickhouse-backup \
    --namespace=$arthur_namespace \
    --from=cronjob/clickhouse-backup-cronjob
ch_backup_jobname=$(kubectl get jobs -o name -n "$arthur_namespace" | grep "$name-clickhouse-backup")
kubectl wait $ch_backup_jobname \
    --namespace=$arthur_namespace \
    --for=condition=complete \
    --timeout=30m

echo "Taking a backup of the enrichments infrastructure"
velero backup create $name-enrichments \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='component in (kafka-mover-init-connector, model_server)' \
    --include-resources=deployments,services \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
    --storage-location=$storage_location \
    --wait

echo "Taking a backup of workflows"
velero backup create $name-workflows \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --include-resources=workflows \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
    --storage-location=$storage_location \
    --wait

echo "Taking a backup of Kafka/Kafka-ZK StatefulSets, their EBS Volumes, and related components"
velero backup create $name-messaging \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='app in (cp-zookeeper,cp-kafka)' \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets \
    --storage-location=$storage_location \
    --wait

echo "Taking a backup of the RDS database"
aws rds create-db-cluster-snapshot \
    --db-cluster-snapshot-identifier $name-snapshot \
    --db-cluster-identifier RDS_DB_NAME \
    --profile AWS_PROFILE_NAME \
    --region AWS_REGION