How do I achieve cross-region load balancing on Google Container Engine?
I will have one Kubernetes cluster per region in several regions and I need to route traffic fro
Google's Network load balancing (L3) load balancing is specifically per-region (these are the load balancers that are automatically configured if you create a service of type LoadBalancer
). As Alex mentioned in his answer, if you use network load balancing you will need to configure one load balancer per region and then use DNS to spread user requests to each of your load balancers.
Google's HTTP(S) load balancing is cross-region (e.g. global). This means that you get a single IP that will balance across all of your HTTP(S) backends, which can be spread across multiple clusters in multiple regions. For cross cluster load balancing, you must configure the HTTP(S) load balancer yourself as described in Is it possible to use 1 Kubernetes ingress object to route traffic to k8s services in different clusters?.
In either case, you will need to create a different service for for each URL path that you want to route to a unique backend. The services don't have to use different pods, although you may want to if they receive different amounts of traffic and you want to scale them independently.
If you use the HTTP(S) load balancer, you can define these services and the URL mapping as part of the load balancer configuration and let the HTTP(S) balancer do the request inspection / routing for you. If you use the network load balancer, then you will need to run an HTTP(S) server yourself that terminates the connection, inspects the request, and routes it to the appropriate service.
Instead of all this, can I actually get a Kubernetes cluster to span different regions?
Not out of the box. You can configure a multi-zone cluster (within a region), but we don't offer explicit support for configuring a cluster than spans regions. While you could do this manually yourself, we don't recommend it as there are many parameters baked into the cluster management software that have been tuned with the assumption of low-latency communication between the master and nodes within the cluster.