Backing Up the Arthur Platform
Once all the Pre-Requisites have been met, the various Arthur platform components can be backed up. The process to manually backup individual components is detailed below, which may also be {ref}scripted<scripted_solution>
.
Backing up Clickhouse Data
By default, the Arthur Platform ships with a Kubernetes CronJob, which backs up Clickhouse daily at midnight.
To manually back up ClickHouse data, you can run the following commands:
arthur_namespace="Put your Arthur namespace here"
$ kubectl get cronjobs -n $arthur_namespace | grep -i clickhouse
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
clickhouse-backup-cronjob 0 0 * * * False 0 14h 2d18h
$ kubectl create job clickhouse-backup --from=cronjob/clickhouse-backup-cronjob -n $arthur_namespace
job.batch/clickhouse-backup created
$ kubectl get jobs -n $arthur_namespace
NAME COMPLETIONS DURATION AGE
clickhouse-backup-cronjob-27735840 1/1 8m35s 14m
Backing Up Enrichments
The Arthur Platform uses Velero to take a backup of the Enrichments Infrastructure and the Enrichments workflows.
The Enrichments infrastructure and Enrichment Workflows are orchestrated as separate backups and will require running 2 separate commands.
Backing Up Enrichments Infrastructure
To manually back up the Enrichments infrastructure, run the following commands:
# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"
arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"
$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
arthur_namespace=<insert-arthur-namespace-here>
velero_namespace=<insert-velero-namespace-here>
$ velero backup create $name-enrichments \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--selector='component in (kafka-mover-init-connector, model_server)' \
--include-resources=deployments,services \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
--storage-location=$storage_location \
--wait
Backing Up Enrichments Workflows
To manually back up the Enrichments Workflows, run the following commands:
# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"
arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"
$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
velero backup create $name-workflows \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--include-resources=workflows \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
--storage-location=$storage_location \
--wait
Backing Up Messaging Infrastructure
The Arthur Platform uses Velero to take a backup of the Kafka (and ZooKeeper) Deployment State and EBS Volumes.
To manually back up Kafka, run the following commands:
# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"
arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"
$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ velero backup create $name-messaging \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--selector='app in (cp-zookeeper,cp-kafka)' \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets \
--storage-location=$storage_location \
--wait
Backing Up RDS Postgres
RDS database backups are called Snapshots. To manually create a snapshot of an RDS database, execute the below script:
$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ aws rds create-db-cluster-snapshot \
--db-cluster-snapshot-identifier $name-snapshot \
--db-cluster-identifier RDS_DB_NAME \
--profile AWS_PROFILE_NAME \
--region AWS_REGION
RDS Compatibility
The command is only compatible for a multi-region RDS database cluster. If you are using a single-region RDS database, the command to use is
aws rds create-db-snapshot
.
For more information, please refer to the AWS Documentation:
Updated 9 months ago