Access HTTP service running in GKE from Google Dataflow

后端 未结 3 2040
感动是毒
感动是毒 2021-01-02 16:53

I have an HTTP service running on a Google Container Engine cluster (behind a kubernetes service).

My goal is to access that service from a Dataflow job running on t

相关标签:
3条回答
  • 2021-01-02 17:31

    EDIT: this is now supported on GKE (now known as Kubernetes Engine): https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing

    I have implemented this in a pretty smooth way IMHO. I will try to walk through briefly how it works:

    • Remember that when you create a container cluster (or nodepool), it will consist of a set of GCE instances in an instance group that is a part of the default network. NB: add a specific GCE network tag(s) so that you can add only those instances to a firewall rule later for letting load balancer check instance health.
    • This instance group is just a regular instance group.
    • Now, remember that kubernetes has something called NodePort, which will expose the service at this port on all nodes, i.e all GCE instances in your cluster. This is what we want!
    • Now that we know that we have a set of GCE instances in an instance group we can then add this instance group to an internal load balancer in your default network without it needing to know anything about kubernetes internals or DNS.
    • The Guide which you can follow, skipping many of the initial steps is here: https://cloud.google.com/compute/docs/load-balancing/internal/
    • Remember that this works for regions, so dataflow and everything else must be in the same region.

    See this spec for the service:

    kind: Service
      apiVersion: v1
    metadata:
      name: name
      labels:
        app: app
    spec:
      selector:
        name: name
        app: app
        tier: backend
      ports:
      - name: health
        protocol: TCP
       enter code here port: 8081
        nodePort: 30081
      - name: api
        protocol: TCP
        port: 8080
        nodePort: 30080
      type: NodePort
    

    This is the code for setting up the load balancer with health checks, forwarding rules and firewall that it needs to work:

    _region=<THE_REGION>
    _instance_group=<THE_NODE_POOL_INSTANCE_GROUP_NAME>
    #Can be different for your case
    _healtcheck_path=/liveness
    _healtcheck_port=30081
    _healtcheck_name=<THE_HEALTCHECK_NAME>
    _port=30080
    _tags=<TAGS>
    _loadbalancer_name=internal-loadbalancer-$_region
    _loadbalancer_ip=10.240.0.200
    
    gcloud compute health-checks create http $_healtcheck_name \
      --port $_healtcheck_port \
      --request-path $_healtcheck_path
    
    gcloud compute backend-services create $_loadbalancer_name \
      --load-balancing-scheme internal \
      --region $_region \
      --health-checks $_healtcheck_name
    
    gcloud compute backend-services add-backend $_loadbalancer_name \
      --instance-group $_instance_group \
      --instance-group-zone $_region-a \
      --region $_region
    
    gcloud compute forwarding-rules create $_loadbalancer_name-forwarding-rule \
      --load-balancing-scheme internal \
      --ports $_port \
      --region $_region \
      --backend-service $_loadbalancer_name \
      --address $_loadbalancer_ip
    #Allow google cloud to healthcheck your instance
    gcloud compute firewall-rules create allow-$_healtcheck_name \
      --source-ranges 130.211.0.0/22,35.191.0.0/16 \
      --target-tags $_tags \
      --allow tcp
    
    0 讨论(0)
  • 2021-01-02 17:31

    Lukasz's answer is probably the most straightforward way to expose your service to dataflow. But, if you really don't want a public IP and DNS record, you can use a GCE route to deliver traffic to your cluster's private IP range (something like option 1 in this answer).

    This would let you hit your service's stable IP. I'm not sure how to get Kubernetes' internal DNS to resolve from Dataflow.

    0 讨论(0)
  • 2021-01-02 17:37

    The Dataflow job running on GCP will not be part of the Google Container Engine cluster, so it will not have access to the internal cluster DNS by default.

    Try setting up a load balancer for the service that you want to expose which knows how to route the "external" traffic to it. This will allow you to connect to the IP address directly from a Dataflow job executing on GCP.

    0 讨论(0)
提交回复
热议问题