Say I am running my app in GKE, and this is a multi-tenant application.
I create multiple Pods that hosts my application.
Now I want: Customers 1-1000 to us
You can guarantee session affinity with services, but not as you are describing. So, your customers 1-1000 won't use pod-1, but they will use all the pods (as a service makes a simple load balancing), but each customer, when gets back to hit your service, will be redirected to the same pod.
Note: always within time specified in (default 10800):
service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
This would be the yaml file of the service:
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
If you want to specify time, as well, this is what needs to be added:
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10
Note that the example above would work hitting ClusterIP type service directly (which is quite uncommon) or with Loadbalancer type service, but won't with an Ingress behind NodePort type service. This is because with an Ingress, the requests come from many, randomly chosen source IP addresses.
Not with Pods by themselves, but you should be able to with Services.
Pods are intended to be stateless and indistinguishable from one another.
But you should be able to create a Deployment per customer group, and a Service per Deployment. The Ingress nginx should be able to be told to map incoming requests by whatever attributes are relevant to specific customer group Services.