Skip to content

OCP on AWS - Interrupt Spot instances with FIS

This guide shows how to create a AWS FIS (Fault Injection Service) experiment to interrupt spot instances in OpenShift clusters running on AWS.

The experiment creates a rule to trigger the spot interruption signal for N nodes matching the kubernetes cluster tags (used by default in OpenShift clusters).

Prerequisites

  • AWS CLI installed
  • AWS Permissions to create CloudFormation stack setting IAM capabilities
  • OpenShift cluster installed on AWS with instance SPOT lifecycle created (machineset)
  • Each group of instances (MachineSet) must be created with EC2 Tag fis:spot-interrupt to match the FIS experiment.

Steps

Steps to run it:

  • Download the CloudFormation template to create the experiment:
curl -Lso ./50_fips_spot_interruption.yaml https://raw.githubusercontent.com/mtulio/mtulio.labs/master/labs/ocp-install-iac/aws-cloudformation-templates/50_fips_spot_interruption.yaml
  • Create the CloudFormation stack targetting the cluster:
export AWS_REGION=us-east-1
# Create the experiment triggering many spot instances (100)
## Time before Interrupt in minutes
BEFORE_INTERRUPTION_TIME=2
SPOT_INSTANCE_COUNT=10
TEMPLATE_PATH=file://./50_fips_spot_interruption.yaml
# TEMPLATE_PATH=file://./labs/ocp-install-iac/aws-cloudformation-templates/50_fips_spot_interruption.yaml

aws cloudformation create-stack \
    --region ${AWS_REGION} \
    --stack-name "${INFRA_ID}-spot-interrupt" \
    --template-body ${TEMPLATE_PATH} \
    --capabilities CAPABILITY_IAM \
    --parameters \
        ParameterKey=InstancesToInterrupt,ParameterValue=${SPOT_INSTANCE_COUNT} \
        ParameterKey=DurationBeforeInterruption,ParameterValue=${BEFORE_INTERRUPTION_TIME}

aws cloudformation wait stack-create-complete \
    --region ${AWS_REGION} \
    --stack-name "${INFRA_ID}-spot-interrupt"

aws cloudformation describe-stacks \
    --region ${AWS_REGION} \
    --stack-name "${INFRA_ID}-spot-interrupt" \
    --query 'Stacks[]'
  • Run the experiment
export EXPERIMENT_TEMPLATE_ID=$(aws cloudformation describe-stacks --region ${AWS_REGION} \
    --stack-name "${INFRA_ID}-spot-interrupt"  \
    --query 'Stacks[].Outputs[?OutputKey==`FISExperimentTemplateID`][].OutputValue' \
    --output text)

aws fis start-experiment --experiment-template-id $EXPERIMENT_TEMPLATE_ID
  • Check the experiments:

    • AWS Console: https://us-east-1.console.aws.amazon.com/fis/home?region=us-east-1#Experiments
    • CLI:
EXPERIMENT_ID=$(aws fis list-experiments \
    | jq -r ".experiments[] \
        | select (.state.status==\"completed\") \
        | select(.experimentTemplateId==\"$EXPERIMENT_TEMPLATE_ID\").id")

# Show the experiment (tip: AWS Console shows nodes affected, the CLI is not returning it.)
aws fis get-experiment --id $EXPERIMENT_ID

References:

  • https://docs.aws.amazon.com/fis/latest/userguide/fis-actions-reference.html#send-spot-instance-interruptions
  • https://ec2spotworkshops.com/karpenter/060_scaling/fis_experiment.html
  • https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-fis-experimenttemplate.html#cfn-fis-experimenttemplate-actions