We upgraded to Visual Studio 2012 and decided to look into using std::atomic for inter-thread synchronization
Strangely enough Microsoft implementation is using _InterlockedOr (which internally generates lock cmpxchg DWORD PTR [rcx], edx) for std::atomic<int>::load across all memory models (relaxed, acquire, consume and sequential)
I decided to buy just::thread library and try it instead, just::thread is using a read of volatile variable, which is what we have been using explicitly in code up to now to implement Acquire semantics
Microsoft states explicitly that as long as /volatile::ms compiler option is used (which is by default) one can use volatile objects to be used for memory locks and releases.
http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd.aspx Would love to know why _InterlockedOr is used which potentially can degrade performance with unnecessary memory barrier for Relaxed!!! and Acquire semantics on Intel-x64.
Am I missing something? Or it is a bug in our code and just::thread std::atomic<int>::load implementation?
Thank you in advance.