fastest possible way to pass data from one thread to another

不羁的心 提交于 2019-12-05 03:34:55
Martin Ba

Since you have ints, what you (ideally) measure above is the overall latency between a call to push() to the time pop() returns true.

This doesn't make sense: The consumer thread is busily polling the queue, that is it loops and busily checks whether pophas fetched a value.

  • This is wasteful, and
  • if you want to minimize latency, polling is certainly not the way to go

If (IFF) you want to minimize latency (for a single item), my guess would be to use a signaling synchronization mechanism, spsc_queue, as far as I can tell, does not provide for this. (You'd need a container or custom solution where you employ a kind of condition variable / Event, ...)

If (IFF), however, you want to maximise throughput (items per time), then measuring the latency for a "wakeup" of a (single) item does make even less sense. In that case you want to make the best use of the parallelism you have, as is mentioned in a comment:

Often the fastest way to pass data is to use a single thread for each chunk of data. That is to say, use only the parallelism present in the data.


Addressing your bullet points:

  • How good is the test app: I do not think it makes much sense.

    • Having scheduledAt in an atomic is required, as you write it from one thread and read it from another. Otherwise you have UB.
    • Obviously any measurement difference wrt. this is purely a measurement error and doesn't say anything about the inherent latency. (You could try putting an aggregate struct {int val; int64_t time; }; into the queue, thereby avoiding the atomic fence.
  • Current industry best time : no clue. Not sure anyone cares about this. (Maybe inside some kernel stuff?)

  • Choice of spsc_queue : I don't think it is a good choice because it requires polling.

  • faster than spsc_queue? : See above. Use non-polling notification.

  • write a code which do same work significantly faster? : No. Or rather, I won't. =>

To quote "man"s answer:

  1. you define the problem and select an appropriate synchronization mechanism

The problem with your question is that there is no problem definition.

As far as I am concerned so far, in the context of a user-land process on a regular OS, cross thread notification latency seems utterly irrelevant. What is your use case?

man

First of all, writing such a test program is completely useless. You don't do any work with the data so the results are skewed. Second, your test is using usleep() between pushes - at this rate you can use any kind of synchronization primitive. It also seems that your Consumer() never exits...

The way you implement such a thing is the following:

  1. you define the problem and select an appropriate synchronization mechanism
  2. you implement the software
  3. you profile the software to identify potential hotspots
  4. you optimize based on the results from the previous step and repeat.

You need some previous experience at the first step or you can try to implement different approaches and see what works best.

It depends on the semantics of the application and how many threads are involved. So far you're looking at raw latency. With more threads, scaling might also start to be an interesting metric.

For the two-threaded case, atomic updates to a single location, preferably in a cache line that's not being touched by any other operations, could be faster if what you're doing with the retrieved data allows it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!