问题
So I've started a simple test,
cbs-pillowfight -h localhost -b default -i 1 -I 10000 -T
Got:
[10717.252368] Run
+---------+---------+---------+---------+
[ 20 - 29]us |## - 257
[ 30 - 39]us |# - 106
[ 40 - 49]us |###################### - 2173
[ 50 - 59]us |################ - 1539
[ 60 - 69]us |######################################## - 3809
[ 70 - 79]us |################ - 1601
[ 80 - 89]us |## - 254
[ 90 - 99]us |# - 101
[100 - 109]us | - 43
[110 - 119]us | - 17
[120 - 129]us | - 48
[130 - 139]us | - 23
[140 - 149]us | - 14
[150 - 159]us | - 5
[160 - 169]us | - 5
[170 - 179]us | - 1
[180 - 189]us | - 3
[210 - 219]us | - 1
[270 - 279]us | - 1
+----------------------------------------
Then, a cluster was created by adding this node to another i7 node. 'Default' bucket is definitely smaller than 1Gb, it has 1 replica and 2 writers, flush is not set.
Now, same command produces (both hosts used ):
- 50% in 100-200 ns, 1% in 200-900 ns, 49% in 900ns to "1 to 9 ms!" WTF.
After adding -r (ratio) switch set to 90% SETs,
- 25% in 100-200ns, 74% in 900ns, remaining in 900ns to "1 to 9 ms!"
So it seems that write performance suffers much in clustered mode; why it can be such a large, 10x drop? Network is clean, there are no highload services running..
UPD1.
Forgot to add the ideal case: -r 100.
- 25% in 100-200 ns, 74% in 900 ns.
This makes me think, that:
- A) benchmark code is blocking somewhere (quick reading shown no signs)
- B) server is doing some non-logged magic on SETs I can't understand to reconfigure. Replication factor? Isn't that a nonsense for a small dataset? That's what I'm trying to ask here.
- C) network problem. But wireshark shows nothing.
UPD2.
Stopped both nodes, moved them to tmpfs. For a "normal" responses, got 20ns improval. But slow responses remain slow.
..[cut]
50 - 59]us |## - 164
[ 60 - 69]us |#### - 321
[ 70 - 79]us |######## - 561
[ 80 - 89]us |########## - 701
[ 90 - 99]us |############ - 844
[100 - 109]us |########## - 717
[110 - 119]us |####### - 514
[120 - 129]us |##### - 336
[130 - 139]us |### - 230
[140 - 149]us |## - 175
[150 - 159]us |## - 135
[160 - 169]us |# - 81
..[cut]
[930 - 939]us | - 24
[940 - 949]us |## - 139
[950 - 959]us |##### - 339
[960 - 969]us |####### - 474
[970 - 979]us |####### - 534
[980 - 989]us |###### - 467
[990 - 999]us |##### - 342
[ 1 - 9]ms |######################################## - 2681
[ 10 - 19]ms | - 1
..[cut]
UPD3: screenshot.
回答1:
Problem is "solved" by switching to three-node configuration on gigabit network.
来源:https://stackoverflow.com/questions/23013429/couchbase-possible-reasons-for-10x-difference-in-cbs-pillowfight-latency-test