Ok, so I am reading about synchronization, and I read through various algorithms such as spinlocks, semaphores, and mutex to avoid race condition.
However, these algorithms can't prevent race condition in SMP when multiple proceses access the data exactly at the same time.
For example, suppose thread 1 in processor A runs lock(mutex1); withdraw(1000); unlock(mutex1);
and thread 2 in processor B runs lock(mutex1); deposit(1000); deposit(1000); unlock(mutex1);
When both threads run EXACTLY AT THE SAME TIME, both threads will be in critical section simultaneously.
The only solution (should be in hardware level) would be making each processors run slightly off to each other, but it defeats the purpose of parallelism.
Is there any hardware level support to avoid these situation where multiple processors try to acquire the lock at the exactly same time?
(this is not a problem of atomicity, but rather problem of exact parallelism, and I wonder how SMP deals with it).
The whole point of a mutex is that even if two cores try to acquire it at the same time, one of them will be blocked until the other one releases it. A mutex that permitted two cores to hold that mutex at the same time would be completely broken and useless for its sole, intended purpose.
Somewhere in hardware there is a bus arbitrator that permits only one core to control the bus that links those two cores. If either core already has the memory holding the mutex in a private cache, that core will win. Otherwise, whichever one gets the bus first will win.
The bus arbitrator may work in many ways, but typically it will rotate. So if the cores are 0, 1, 2, and 3 and core 2 had the bus last, the bus will next go to core 3 if it wants it, otherwise core 0 if it wants it, otherwise core 1 if it wants it, otherwise back to core 2. Depending on exactly which bus is involved (whether it's a fight between the two core's L2 caches or over the memory itself or whatever) some of the cores may contend as a unit against other core groups and then sub-contend for which specific core gets it first.
It may be that one core already has control over the bus and so it will win outright. Typically, the arbitrator allows a core to continuously hold the bus so long as it continues to want to use it for a few transactions to avoid wasteful handoffs that don't allow the core to make forward progress.
The exact details can vary depending on a large number of factors including how the cores are arranged, which cores have the lock in their caches in what states, who had the bus last and whether the bus arbitrator uses timeslices, round-robin, or some other mechanism, and so on. But any implementation that didn't guarantee that one and only one core winds up getting the lock would be considered horribly broken.
You might want to look into memory barriers.
http://en.wikipedia.org/wiki/Memory_barrier
In this case the locks would use memory barriers so that the internal value used in the lock cannot be accessed by multiple processors at once.
Some architectures also allow locking of all cores except 1 to allow for this. For example x86 sports a LOCK prefix that when added to instructions will lock access to memory during that instruction. (e.g: LOCK ADD EAX, 1 for an atomic increment to a register)
Architectures that don't support LOCK or atomic instructions use compare and exchange or test and set/swap. Usually it involves a small instruction loop that in high level may look like
while (target != value) target = value;
This may not look like it will execute more than once but it ensures that in between the instructions the value is not change from underneath it. The downside to this approach is if there is high contention on target then it may eat up more clock cycles than you would like but it tends to happen fast enough so it's never really noticeable.
I strongly recommend Curt Schimmel's UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers. Different hardware architectures provide different low-level tools for synchronizing access to data, including some architectures with very nearly no help. Schimmel's book provides algorithms that can work even on those architectures.
I wish I could easily locate my copy to summarize the contents.
You can prevent this using atomic instructions like TLS and XCHG.
How do you ensure atomicity for an instruction?
You can disable all interruptions before executing the instruction, then enable them all after the instruction is done. That doesn't help on multicore systems, because disabling the interruption in processor 1 doesn't have any effect on processor 2. On multicore systems the atomicity of an instruction is ensured by preventing other CPU's from access to the memory bus (memory barrier).
So, if you implement semaphores using these instructions you'll have no problems on SMP.
Implementation of mutex_lock and mutex_unlock using TSL:
mutex_lock:
TSL REGISTER, MUTEX ; copy mutex to register and sets mutex
CMP REGISTER, #0 ; compare mutex to zero
JZE ok ; if zero return
CALL thread_yield ; else: mutex is busy, schedule another thread
JMP mutex_lock ; try again later
ok: RET
mutex_unlock:
MOVE MUTEX,#0 ; free mutex
RET
You can find some information about TSL here: http://en.wikipedia.org/wiki/Test-and-set
A good book that can help you with: http://www.amazon.com/Modern-Operating-Systems-Andrew-Tanenbaum/dp/0136006639/ref=sr_1_sc_1?ie=UTF8&qid=1319333818&sr=8-1-spell
This is a classical deadlock problem. I'm not sure about hardware support, (but I'm almost sure this is supported at hardware level) however, I can give you an example about the solution for the deadlock problem in databases. If you know all the dependencies you know which dependency should be "killed", this way the commands of the given node will fail, but the system will defeat deadlock and the other nodes won't fail. I think the same approach should be at hardware level.
来源:https://stackoverflow.com/questions/7863657/what-happens-if-two-process-in-different-processors-try-to-acquire-the-lock-at-e