I have an HTTP service running on a Google Container Engine cluster (behind a kubernetes service).
My goal is to access that service from a Dataflow job running on t
EDIT: this is now supported on GKE (now known as Kubernetes Engine): https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing
I have implemented this in a pretty smooth way IMHO. I will try to walk through briefly how it works:
NodePort
, which will expose the service at this port on all nodes, i.e all GCE instances in your cluster. This is what we want!See this spec for the service:
kind: Service
apiVersion: v1
metadata:
name: name
labels:
app: app
spec:
selector:
name: name
app: app
tier: backend
ports:
- name: health
protocol: TCP
enter code here port: 8081
nodePort: 30081
- name: api
protocol: TCP
port: 8080
nodePort: 30080
type: NodePort
This is the code for setting up the load balancer with health checks, forwarding rules and firewall that it needs to work:
_region=<THE_REGION>
_instance_group=<THE_NODE_POOL_INSTANCE_GROUP_NAME>
#Can be different for your case
_healtcheck_path=/liveness
_healtcheck_port=30081
_healtcheck_name=<THE_HEALTCHECK_NAME>
_port=30080
_tags=<TAGS>
_loadbalancer_name=internal-loadbalancer-$_region
_loadbalancer_ip=10.240.0.200
gcloud compute health-checks create http $_healtcheck_name \
--port $_healtcheck_port \
--request-path $_healtcheck_path
gcloud compute backend-services create $_loadbalancer_name \
--load-balancing-scheme internal \
--region $_region \
--health-checks $_healtcheck_name
gcloud compute backend-services add-backend $_loadbalancer_name \
--instance-group $_instance_group \
--instance-group-zone $_region-a \
--region $_region
gcloud compute forwarding-rules create $_loadbalancer_name-forwarding-rule \
--load-balancing-scheme internal \
--ports $_port \
--region $_region \
--backend-service $_loadbalancer_name \
--address $_loadbalancer_ip
#Allow google cloud to healthcheck your instance
gcloud compute firewall-rules create allow-$_healtcheck_name \
--source-ranges 130.211.0.0/22,35.191.0.0/16 \
--target-tags $_tags \
--allow tcp
Lukasz's answer is probably the most straightforward way to expose your service to dataflow. But, if you really don't want a public IP and DNS record, you can use a GCE route to deliver traffic to your cluster's private IP range (something like option 1 in this answer).
This would let you hit your service's stable IP. I'm not sure how to get Kubernetes' internal DNS to resolve from Dataflow.
The Dataflow job running on GCP will not be part of the Google Container Engine cluster, so it will not have access to the internal cluster DNS by default.
Try setting up a load balancer for the service that you want to expose which knows how to route the "external" traffic to it. This will allow you to connect to the IP address directly from a Dataflow job executing on GCP.