I want to benchmark my Cassandra Clusters with 1, 2, 3 and 4 instances. So I ran the cassandra-stress tool on one of the nodes. The benchmark shows strange
If you are only running cassandra-stress on one node then I think this would be the expected result. A single machine cannot saturate a four node cluster and would be a bottleneck.
Also if you are running cassandra-stress on one of the cassandra nodes, then that node will be doubly loaded by running both Cassandra and the stress client. This will put extra strain on the CPU and network connection for that machine.
To get a true picture of your cluster throughput, you should run stress from multiple machines outside the cluster (but on the same LAN).