Author Topic: std::atomic<int>::load implementation on Intel-x64 (Read 63852 times)

oleg@completedb.com · « **on:** April 29, 2013, 06:22:54 PM »

We upgraded to Visual Studio 2012 and decided to look into using std::atomic for inter-thread synchronization
Strangely enough Microsoft implementation is using _InterlockedOr (which internally generates lock cmpxchg DWORD PTR [rcx], edx) for std::atomic<int>::load across all memory models (relaxed, acquire, consume and sequential)
I decided to buy just::thread library and try it instead, just::thread is using a read of volatile variable, which is what we have been using explicitly in code up to now to implement Acquire semantics
Microsoft states explicitly that as long as /volatile::ms compiler option is used (which is by default) one can use volatile objects to be used for memory locks and releases.
http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd.aspx

Would love to know why _InterlockedOr is used which potentially can degrade performance with unnecessary memory barrier for Relaxed!!! and Acquire semantics on Intel-x64.
Am I missing something? Or it is a bug in our code and just::thread std::atomic<int>::load implementation?

Thank you in advance.

Anthony Williams · « **Reply #1 on:** April 29, 2013, 06:43:02 PM »

I believe that the VS2012 implementation is being overly conservative. If the appropriate synchronization is used on the store, then an atomic<int>::load need only be a MOV on x86.

Microsoft is trying to phase out the use of volatile for synchronization, and is encouraging people to use std::atomic instead, hence the compiler switch. It is a shame if their library generates suboptimal code in this case.

Just::Thread does not rely on the special semantics of volatile; where our source code uses volatile it is just to force the compiler to issue the load --- the _ReadWriteBarrier() intrinsic is used to restrict reordering by the compiler.

oleg@completedb.com · « **Reply #2 on:** April 29, 2013, 06:53:24 PM »

Thank you.
We should probably warn C++ world

especially those who is implementing lock-free algos where suboptimal code is not an option.

News:

Author Topic: std::atomic<int>::load implementation on Intel-x64 (Read 63852 times)

oleg@completedb.com

std::atomic<int>::load implementation on Intel-x64

Anthony Williams

Re: std::atomic<int>::load implementation on Intel-x64

oleg@completedb.com

Re: std::atomic<int>::load implementation on Intel-x64