I\'m using user_data to do initial configuration of the hosts used in ECS to run containers. I want to be able to tell AWS ECS to migrate containers to a newly created hosts
While Yevgeniy's answer is correct in that there's no way to get Terraform to directly migrate containers to new instances in the event the instances are recreated there is a much cleaner option available to you using Terraform's resource lifecycle configuration.
Assuming you are using autoscaling groups to back your ECS hosts you can do something like this:
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["paravirtual"]
}
owners = ["099720109477"] # Canonical
}
resource "aws_launch_configuration" "as_conf" {
name_prefix = "terraform-lc-example-"
image_id = "${data.aws_ami.ubuntu.id}"
instance_type = "t1.micro"
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "bar" {
name = "${aws_launch_configuration.as_conf.name}"
launch_configuration = "${aws_launch_configuration.as_conf.name}"
lifecycle {
create_before_destroy = true
}
}
(taken from Terraform's launch configuration docs)
Now, when the launch configuration changes, for example if either the user data or the AMI being used changes, this will force Terraform to create a new launch configuration which in turn forces a new autoscaling group due to the dependency on the name.
As Terraform is using the create_before_destroy
lifecycle configuration it will create the new launch config and ASG before destroying it. In the above simple setup the ASG will return as completed as soon as a single instance is deemed healthy by AWS.
Unfortunately that only shows when the EC2 instance is healthy and not that it's successfully running tasks. As mentioned in the comments to this answer ECS won't autobalance the tasks to new instances in the cluster and so Terraform will then destroy the instances that are running ECS tasks in the old ASG before ECS can reschedule them on to the new ASG instances causing an outage.
To work around this (and to also allow for instances to fail and be replaced generally in a nicer way) you can use ASG lifecycle hooks to perform some action when an instance is marked for termination but before it is actually terminated.
There's a nice AWS blog post about doing exactly this and has some [example Lambda code] that responds to the hook to drain the container instances that are marked for termination before completing the lifecycle hook which will then allow the ASG to terminate the instances. Having drained the container instances, ECS will automatically reschedule the minimum number of healthy tasks on to the non draining instances (in the new ASG).
If your ECS tasks are registered to a load balancer the tasks will be deregistered by ECS from the load balancer once a new set of tasks are running and then the tasks will remain for the duration of the load balancer's connection drain timeout period.
I don't think ECS has a built-in way to do this. As a result, it usually requires a fairly tedious & manual process (albeit one that could be scripted). There are a few different ways to do it, but this is typically the simplest one:
user_data
.terraform apply
.user_data
:
user_data
, to replace the terminated EC2 Instance.Once you have gone through this process, all of the Instances in the ASG will be running the new user_data
. Note that this can be done with zero-downtime for your ECS Tasks as long as:
If you can't meet those requirements, then you may have some downtime, or you may need to pursue a messier option that involves doubling the size of the ASG, waiting for the new EC2 Instances (which will have the new user_data
) to deploy in the ASG, doubling the number of ECS Tasks, waiting for those new ECS Tasks to deploy (they will typically deploy onto the new EC2 Instances), and then reducing each by half again (in theory, the old ECS Tasks and old EC2 Instances will be terminated, leaving just the new ones behind).