Setting up for High Availability#

Introduction#

The Arthur Platform is built to run in a High Availability configuration, ensuring that the application can function in the event of a Data Center outage, hardware outages, or other similar infrastructure issues.

In order to take advantage of this, there are a few requirements in how your infrastructure is setup:

1. Installing across 3 Availability Zones

2. Specifying the correct Instance Types

3. Configuring the cluster for Auto-scaling

Note

Note that this document is written using AWS terminology, as this is one of the environments/infrastructure that Arthur uses internally for our environments. However, these setup steps should work across various cloud providers using similar features.

Note

Note that this document is written with the pre-requisite that you are installing Arthur in a High Availability configuration. At the minimum, this means that there should be 3 instances across which Arthur is deployed.

Installing across 3 Availability Zones#

In order to ensure continuous operation during an Availability Zone (AZ) outage, Arthur must be installed on a cluster that has 3 Availability Zones. This ensures that in the event of one AZ outage that the rest of the components can still operate.

To do this in AWS, create 3 separate Auto-Scaling Groups (ASGs) - one for each AZ. You can configure which AZ an ASG provisions instances into when you create the ASG.

When Arthur deploys, the stateful services (eg: databases, messaging queues, etc.) will be balanced across the 3 AZs automatically using kubernetes pod anti-affinity rules (pods will not schedule onto nodes where there already exists another pod that is of the same component).

Specifying the correct Instance Types#

Generally speaking, the best way to ensure you have deployed the correct Instance Types is to monitor resource utilization across the cluster to determine when your services are hitting resource limits.

When initially configuring a cluster for Arthur, we recommend 3 nodes, where each node has at least 16 vCPUs and 64G of RAM (eg: an m5a.4xlarge instance type). This is a good starting point for a general-purpose cluster that will grow with your production usage.

Configuring the cluster for Auto-scaling#

Arthur’s stateless components horizontally auto-scale, but in order to take the maximum advantage of this, you will need to configure and install an additional component that performs node autoscaling (eg: adding more instances).

AWS specifies how to setup cluster autoscaling in their documentation.

Generally speaking, it involves setting up an IAM role and granting permissions to autoscale the cluster, and then installing a third-party component to perform the autoscaling (eg: cluster-autoscaler