How to force AWS ECS migrate containers to another ASG?

前端 未结 2 453
Happy的楠姐
Happy的楠姐 2021-01-07 04:45

I\'m using user_data to do initial configuration of the hosts used in ECS to run containers. I want to be able to tell AWS ECS to migrate containers to a newly created hosts

相关标签:
2条回答
  • 2021-01-07 05:24

    While Yevgeniy's answer is correct in that there's no way to get Terraform to directly migrate containers to new instances in the event the instances are recreated there is a much cleaner option available to you using Terraform's resource lifecycle configuration.

    Assuming you are using autoscaling groups to back your ECS hosts you can do something like this:

    data "aws_ami" "ubuntu" {
      most_recent = true
      filter {
        name = "name"
        values = ["ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-*"]
      }
      filter {
        name = "virtualization-type"
        values = ["paravirtual"]
      }
      owners = ["099720109477"] # Canonical
    }
    
    resource "aws_launch_configuration" "as_conf" {
      name_prefix = "terraform-lc-example-"
      image_id = "${data.aws_ami.ubuntu.id}"
      instance_type = "t1.micro"
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_autoscaling_group" "bar" {
      name = "${aws_launch_configuration.as_conf.name}"
      launch_configuration = "${aws_launch_configuration.as_conf.name}"
    
      lifecycle {
        create_before_destroy = true
      }
    }
    

    (taken from Terraform's launch configuration docs)

    Now, when the launch configuration changes, for example if either the user data or the AMI being used changes, this will force Terraform to create a new launch configuration which in turn forces a new autoscaling group due to the dependency on the name.

    As Terraform is using the create_before_destroy lifecycle configuration it will create the new launch config and ASG before destroying it. In the above simple setup the ASG will return as completed as soon as a single instance is deemed healthy by AWS.

    Unfortunately that only shows when the EC2 instance is healthy and not that it's successfully running tasks. As mentioned in the comments to this answer ECS won't autobalance the tasks to new instances in the cluster and so Terraform will then destroy the instances that are running ECS tasks in the old ASG before ECS can reschedule them on to the new ASG instances causing an outage.

    To work around this (and to also allow for instances to fail and be replaced generally in a nicer way) you can use ASG lifecycle hooks to perform some action when an instance is marked for termination but before it is actually terminated.

    There's a nice AWS blog post about doing exactly this and has some [example Lambda code] that responds to the hook to drain the container instances that are marked for termination before completing the lifecycle hook which will then allow the ASG to terminate the instances. Having drained the container instances, ECS will automatically reschedule the minimum number of healthy tasks on to the non draining instances (in the new ASG).

    If your ECS tasks are registered to a load balancer the tasks will be deregistered by ECS from the load balancer once a new set of tasks are running and then the tasks will remain for the duration of the load balancer's connection drain timeout period.

    0 讨论(0)
  • 2021-01-07 05:32

    I don't think ECS has a built-in way to do this. As a result, it usually requires a fairly tedious & manual process (albeit one that could be scripted). There are a few different ways to do it, but this is typically the simplest one:

    1. Make your change to user_data.
    2. Run terraform apply.
    3. For each EC2 Instance in your ASG that has the old user_data:
      1. Terminate that EC2 Instance. You can do this via the AWS CLI or via the EC2 web console.
      2. After a little while, the ASG will automatically launch a new EC2 Instance, with your new user_data, to replace the terminated EC2 Instance.
      3. After a little while, ECS will automatically launch new copies of any ECS Tasks that happened to be running on the terminated EC2 Instance.

    Once you have gone through this process, all of the Instances in the ASG will be running the new user_data. Note that this can be done with zero-downtime for your ECS Tasks as long as:

    1. There are at least 2 copies of each ECS Task, each one on a separate EC2 Instance in your ASG.
    2. You wait enough time between terminating EC2 Instances for the ECS Task(s) to relaunch.

    If you can't meet those requirements, then you may have some downtime, or you may need to pursue a messier option that involves doubling the size of the ASG, waiting for the new EC2 Instances (which will have the new user_data) to deploy in the ASG, doubling the number of ECS Tasks, waiting for those new ECS Tasks to deploy (they will typically deploy onto the new EC2 Instances), and then reducing each by half again (in theory, the old ECS Tasks and old EC2 Instances will be terminated, leaving just the new ones behind).

    0 讨论(0)
提交回复
热议问题