Thanks for your answer Anthony.
There is still one thing I don't get. To do not ask wrong question based on possible wrong understanding let me at first state what I understand and please fix me if I'm wrong.
We have 4 type of fences in C++1x, they are
memory_order_release - which is no-op on x86
Prevents compiler and hardware from reordering stores, i.e. any store that is done before this should be completed before the fence.
Store to variable with memory_order_release flag means
atomic_thread_fence(memory_order_release);
store();memory_order_acquire - which is no-op on x86
Prevents compiler and hardware from reordering loads, i.e. any load that is done after this should happen after the fence.
Load from variable with memory_order_acquire flag means
load();
atomic_thread_fence(memory_order_acquire);memory_order_acq_rel - which is again no-op on x86
Prevents compiler and hardware from reordering loads with load part of operation and stores with store part of operation.
Operation with memory_order_acq_rel flag means
atomic_thread_fence(memory_order_release);
operation();
atomic_thread_fence(memory_order_acquire);memory_order_seq_cst - which is mfence on x86
Full memory fence, prevents compiler from reordering both loads and stores.
Now, on page 18 of above mentioned article Bartosz says that Peterson Lock is an example of algorithm, which will not work without fences. However, Peterson's lock uses only acquire, release and acq-rel.
http://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C++0x_atomics.html
If those operations are no-op on x86 then Peterson Lock has to work on x86 without any fence. Am I wrong somewhere?
For Dekker's algorithm it's clear, it uses sequentially consistent memory fence, so it requires fencing on x86 too, but what about Peterson lock?
though such a fence can also be achieved with a LOCKed RMW instruction, such as XCHG.
On Visual Studio MemoryBarrier() function is implemented as a not locked RMW instruction.
http://msdn.microsoft.com/en-us/library/ms684208(v=vs.85).aspx
LONG Barrier;
__asm {
xchg Barrier, eax
}
Is it a typeo in MSDN?
Plain loads and stores on x86 give you acquire (for loads) and release (for stores) ordering. This is sufficient for many algorithms, but not for others. LFENCE and SFENCE are primarily of use with non-temporal stores such as MOVNTI, which don't obey the normal cache coherency rules (that's what "non-temporal" means, in effect --- they don't occur at a particular time).
So it means that there is no analogs for sfence and lfence in C++1x, is it right?
Thanks a lot.