Deployment with AWS

This is the recommended option for scalable, production-grade deployments in AWS environments.

Prerequisites

  • Engine Version:
    Look up an engine version to use from the Releases.

  • OpenAI GPT Model:

    • OpenAI GPT model with at least one endpoint (supports Azure and OpenAI).
    • A secure network route between AWS and the OpenAI endpoint(s).
    • Token limits configured as needed.
  • DNS:

    • DNS URL for the GenAI Engine with an SSL certificate.
  • AWS Environment:

    • AWS credentials with permissions to manage IAM, security groups, Secrets Manager, load balancer, RDS, ECS, CloudWatch.
    • VPC with 3 private subnets and 2–3 public subnets.
    • ARN of the TLS certificate from AWS Certificate Manager for the application DNS.
  • Container Image Repository:

    • Network route to Docker Hub OR access to your private registry.
  • Arthur Platform Engine Credentials:

    • If using Arthur Platform, obtain your Client ID and Client Secret.
    • Otherwise, proceed with Engine-only (guardrails) deployment.
  • GPU Recommendation:
    Arthur recommends using GPUs for production deployments for optimal latency and scalability.

Installation Steps

  1. Log in to your AWS account (with your target VPC/subnets).

  2. Navigate to CloudFormation in the AWS Console.

  3. On the "Stacks" page, select Create stack > With new resources (standard).

  4. On the "Create stack" page:

    • Choose "Template is ready" and "Amazon S3 URL".

    • Paste the S3 URL for the CloudFormation template for your desired deployment (GPU/CPU, GenAI Engine, or GenAI Engine-only guardrails). Example:

      https://arthur-cft.s3.us-east-2.amazonaws.com/arthur-engine/templates/<version_number>/root-arthur-engine-gpu.yml
    • Replace <version_number> with your desired version from Releases.

  5. Populate the stack details and click "Next".

  6. Configure stack options as needed, then click "Next".

  7. Review and create the stack.

    • Set provisioning failure to "Roll back all stack resources."
    • Use deletion policy during rollback.
  8. Once GenaiEngineLBStack is complete, create an A record (unless handled by Route 53) that routes the application DNS URL to the ALB.

Architecture Diagram

GenAI Engine Architecture

Frequently Asked Questions (FAQs)

  • Subnet requirements:
    • Private subnets: Host application and database, not accessible from the internet.
    • Public subnets: Entry point for client LLM applications (via ALB), typically routed through an IGW or VPN.
    • Ensure proper routes exist between public and private subnets.
  • IAM & Security Groups:
    • Refer to IAM/security group CloudFormation templates for customization.
  • Azure OpenAI Quota:
    • Use multiple endpoints or request a quota increase from Azure.

Continue with Setup: Click Here