Backing Up the Arthur Platform

Once all the Pre-Requisites have been met, the various Arthur platform components can be backed up. The process to manually backup individual components is detailed below, which may also be {ref}scripted<scripted_solution>.

Backing up Clickhouse Data

By default, the Arthur Platform ships with a Kubernetes CronJob, which backs up Clickhouse daily at midnight.

To manually back up ClickHouse data, you can run the following commands:

arthur_namespace="Put your Arthur namespace here"
$ kubectl get cronjobs -n $arthur_namespace | grep -i clickhouse
NAME                              SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
clickhouse-backup-cronjob         0 0 * * *   False     0        14h             2d18h
$ kubectl create job clickhouse-backup --from=cronjob/clickhouse-backup-cronjob -n $arthur_namespace
job.batch/clickhouse-backup created
$ kubectl get jobs -n $arthur_namespace
NAME                                       COMPLETIONS   DURATION   AGE
clickhouse-backup-cronjob-27735840         1/1           8m35s      14m

Backing Up Enrichments

The Arthur Platform uses Velero to take a backup of the Enrichments Infrastructure and the Enrichments workflows.
The Enrichments infrastructure and Enrichment Workflows are orchestrated as separate backups and will require running 2 separate commands.

Backing Up Enrichments Infrastructure

To manually back up the Enrichments infrastructure, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
arthur_namespace=<insert-arthur-namespace-here>
velero_namespace=<insert-velero-namespace-here>
$ velero backup create $name-enrichments \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='component in (kafka-mover-init-connector, model_server)' \
    --include-resources=deployments,services \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
    --storage-location=$storage_location \
    --wait

Backing Up Enrichments Workflows

To manually back up the Enrichments Workflows, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
velero backup create $name-workflows \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --include-resources=workflows \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
    --storage-location=$storage_location \
    --wait

Backing Up Messaging Infrastructure

The Arthur Platform uses Velero to take a backup of the Kafka (and ZooKeeper) Deployment State and EBS Volumes.
To manually back up Kafka, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ velero backup create $name-messaging \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='app in (cp-zookeeper,cp-kafka)' \
    --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets \
    --storage-location=$storage_location \
    --wait

Backing Up RDS Postgres

RDS database backups are called Snapshots. To manually create a snapshot of an RDS database, execute the below script:

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ aws rds create-db-cluster-snapshot \
    --db-cluster-snapshot-identifier $name-snapshot \
    --db-cluster-identifier RDS_DB_NAME \
    --profile AWS_PROFILE_NAME \
    --region AWS_REGION

📘
RDS Compatibility
The command is only compatible for a multi-region RDS database cluster. If you are using a single-region RDS database, the command to use is aws rds create-db-snapshot.

For more information, please refer to the AWS Documentation:

Backing up Clickhouse Data

Backing Up Enrichments

Backing Up Enrichments Infrastructure

Backing Up Enrichments Workflows

Backing Up Messaging Infrastructure

Backing Up RDS Postgres

📘RDS Compatibility

📘
RDS Compatibility