pthreads v. SSE weak memory ordering

☆樱花仙子☆ 提交于 2019-12-04 18:50:59

问题


Do the Linux glibc pthread functions on x86_64 act as fences for weakly-ordered memory accesses? (pthread_mutex_lock/unlock are the exact functions I'm interested in).

SSE2 provides some instructions with weak memory ordering (non-temporal stores such as movntps in particular). If you are using these instructions and want to guarantee that another thread/core sees an ordering, then I understand you need an explicit fence for this, e.g., a sfence instruction.

Normally you do expect the pthread API to act as a fence appropriately. However, I suspect normal C code on x86 will not generate weakly-ordered memory accesses, so I'm not confident that pthreads needs to act as a fence for weakly-ordered accesses.

Reading through the glibc pthread source code, a mutex is in the end implemented using "lock cmpxchgl", at least on the uncontended path. So I'm guessing that what I need to know is does that instruction act as a fence for SSE2 weakly-ordered accesses?


回答1:


Non-temporal stores need sfence instruction to be ordered properly.

However, the efficient user-level implementation of a simple mutex supposes that it is released by a simple write which does not imply write-buffers flush, in contrast to atomic read-modify-write operations like lock cmpxchg which imply full memory fence.

So you have a situation when the unlock has no effect of store-with-release semantic applied for non-temporal stores. Thus, these SSE stores can be reordered after the unlock and after another thread acquires the mutex.



来源:https://stackoverflow.com/questions/24233959/pthreads-v-sse-weak-memory-ordering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!