I\'m a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can\'t seem to find this bit of information in the documentation.
<seed nodes serve two purposes.
the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.
for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?
Your answer is right. The only thing I would add is that it's recommended to use the same seed list (i.e. in your cassandra.yaml) across the cluster, as a "best practices" sort of thing. Helps gossip traffic propagate in nice, regular rates, since seeds are treated (very minimally) differently by the gossip code (see https://cwiki.apache.org/confluence/display/CASSANDRA2/ArchitectureGossip).
Answering my own question:
Seeds
From the FAQ:
Seeds are used during startup to discover the cluster.
Also from the DataStax documentation on "Gossip":
The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.
From these details it seems that a seed is nothing special to clients.
Clients
From the DataStax documentation on client requests:
All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.
The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.
I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.