Deployment with AWS

Engine Version:
Look up an engine version to use from the Releases.
OpenAI GPT Model:
- OpenAI GPT model with at least one endpoint (supports Azure and OpenAI).
- A secure network route between AWS and the OpenAI endpoint(s).
- Token limits configured as needed.
DNS:
- DNS URL for the GenAI Engine with an SSL certificate.
AWS Environment:
- AWS credentials with permissions to manage IAM, security groups, Secrets Manager, load balancer, RDS, ECS, CloudWatch.
- VPC with 3 private subnets and 2–3 public subnets.
- ARN of the TLS certificate from AWS Certificate Manager for the application DNS.
Container Image Repository:
- Network route to Docker Hub OR access to your private registry.
Arthur Platform Engine Credentials:
- If using Arthur Platform, obtain your Client ID and Client Secret.
- Otherwise, proceed with Engine-only (guardrails) deployment.
GPU Recommendation:
Arthur recommends using GPUs for production deployments for optimal latency and scalability.

Log in to your AWS account (with your target VPC/subnets).
Navigate to CloudFormation in the AWS Console.
On the "Stacks" page, select Create stack > With new resources (standard).
On the "Create stack" page:
- Choose "Template is ready" and "Amazon S3 URL".
- Paste the S3 URL for the CloudFormation template for your desired deployment (GPU/CPU, GenAI Engine, or GenAI Engine-only guardrails). Example:
```
https://arthur-cft.s3.us-east-2.amazonaws.com/arthur-engine/templates/<version_number>/root-arthur-engine-gpu.yml
```
- Replace <version_number> with your desired version from Releases.
Populate the stack details and click "Next".
Configure stack options as needed, then click "Next".
Review and create the stack.
- Set provisioning failure to "Roll back all stack resources."
- Use deletion policy during rollback.
Once GenaiEngineLBStack is complete, create an A record (unless handled by Route 53) that routes the application DNS URL to the ALB.

Subnet requirements:
- Private subnets: Host application and database, not accessible from the internet.
- Public subnets: Entry point for client LLM applications (via ALB), typically routed through an IGW or VPN.
- Ensure proper routes exist between public and private subnets.
IAM & Security Groups:
- Refer to IAM/security group CloudFormation templates for customization.
Azure OpenAI Quota:
- Use multiple endpoints or request a quota increase from Azure.