I\'m trying to pipe extremely high speed data from one application to another using 64-bit CentOS6. I have done the following benchmarks using dd
to discover that t
It seems that Linux pipes only yield up 4096 bytes at a time to the reader, regardless of how large the writer's writes were.
So trying to stuff more than 4096 bytes into a already stuffed pipe per write(2) system call will just cause the writer to stall, until the reader can invoke the multiple reads needed to pull that much data out of the pipe and do whatever processing it has in mind to do.
This tells me that on multi-core or multi-thread CPU's (does anyone still make a single core, single thread, CPU?), one can get more parallelism and hence shorter elapsed clock times by having each writer in a pipeline only write 4096 bytes at a time, before going back to whatever data processing or production it can do towards making the next 4096 block.
Have you tried with smaller blocks?
When I try on my own workstation I note successive improvement when lowering the block size. It is only in the realm of 10% in my test, but still an improvement. You are looking for 100%.
As it turns out testing further, really small block sizes seem to do the trick:
I tried
dd if=/dev/zero bs=32k count=256000 | dd of=/dev/null bs=32k
256000+0 records in
256000+0 records out
256000+0 records in
256000+0 records out
8388608000 bytes (8.4 GB) copied8388608000 bytes (8.4 GB) copied, 1.67965 s, 5.0 GB/s
, 1.68052 s, 5.0 GB/s
And with your original
dd if=/dev/zero bs=8M count=1000 | dd of=/dev/null bs=8M
1000+0 records in
1000+0 records out
1000+0 records in
1000+0 records out
8388608000 bytes (8.4 GB) copied8388608000 bytes (8.4 GB) copied, 6.25782 s, 1.3 GB/s
, 6.25203 s, 1.3 GB/s
5.0/1.3 = 3.8 so that is a sizable factor.