浅谈Interlocked operations

最新推荐文章于 2019-08-12 20:12:32 发布

原创最新推荐文章于 2019-08-12 20:12:32 发布 · 1k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#c++ #多线程

VC++ 专栏收录该内容

17 篇文章

订阅专栏

本文介绍了原子操作的概念，特别是在多线程环境下防止内存指令重排和数据竞争的重要性。通过分析Interlocked系列函数，如InterlockedIncrement，讨论了Acquire和Release语义，以及它们在内存屏障中的作用，强调了Interlocked操作在多核多线程编程中的价值。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Interlocked系列的原子操作函数，你了解到什么程度呢？

什么是原子操作

原子（atom）本意是“不能被进一步分割的最小粒子”，而原子操作（atomic operation）意为"不可被中断的一个或一系列操作" 。

非原子操作会出现的问题

1.例如前面的生产者和消费者问题中的nFood，如果多个线程对其进行增减操作,

不加以互斥操作可能会出现以下情况，nFood = -1;

我们在代码中的判断操作如下：

if( nFood > 0 )

std::cout<< "xxxxx,当前食物剩余"<< --nFood <<std::endl;

在单线程操作中，这段代码是安全的，不会出现nFood = -1这种情况，但是在多核CPU多线程就出现了问题，我们来探究出错的原因。

--nFood的反汇编代码：

其实--nFood是三步操作，在多线程中Load（CPU读操作，把内存数据读入寄存器）与Store（CPU写操作，把修改后的数据写会主存储器）有可能被撕裂掉,导致最后存入的nFood的eax不是减1后的值。

2.除此之外对于多核心CPU多线程处理上可能会发生内存指令重排的操作，原因很简单，为了提高CPU的处理性能和总线性能，比如处理器会这样处理以上指令：比如：

在执行过程中指令顺序重排为：

比如如下代码：

#include <windows.h>
#include <process.h>
#include <iostream>
long X, Y;
long r1, r2;
HANDLE beg_Semaphore1;
HANDLE beg_Semaphore2;
HANDLE end_Semaphore1;
HANDLE end_Semaphore2;

unsigned int __stdcall Thread1(void* lpParameter)
{
	for (;;)                                 
	{
		WaitForSingleObject(beg_Semaphore1, INFINITE);
		while ((rand() % 1000) % 8 != 0) {}
		X = 1;
		r1 = Y;
		ReleaseSemaphore(end_Semaphore1, 1, NULL);
	}
	return 0;
}

unsigned int __stdcall Thread2(void* lpParameter)
{
	for (;;)                                 
	{
		WaitForSingleObject(beg_Semaphore2, INFINITE);
		while ((rand() % 1000) % 8 != 0) {}
		Y = 1;
		r2 = X;
		ReleaseSemaphore(end_Semaphore2, 1, NULL);
	}
	return 0;
}

int _tmain(int argc, _TCHAR* argv[])
{
	beg_Semaphore1 = CreateSemaphore(NULL, 0, 1, NULL);
	beg_Semaphore2 = CreateSemaphore(NULL, 0, 1, NULL);
	end_Semaphore1 = CreateSemaphore(NULL, 0, 1, NULL);
	end_Semaphore2 = CreateSemaphore(NULL, 0, 1, NULL);
	_beginthreadex(NULL, 0, Thread1, (void *)NULL, NULL, NULL);
	_beginthreadex(NULL, 0, Thread2, (void *)NULL, NULL, NULL);
	int detected = 0;
	for (int iterations = 1; iterations < 10000; iterations++)
	{
		// Reset X and Y
		X = 0;
		Y = 0;
		// set r1 and r2 
		r1 = 1;
		r2 = 1;
		// Signal both threads
		ReleaseSemaphore(beg_Semaphore1, 1, NULL);
		ReleaseSemaphore(beg_Semaphore2, 1, NULL);
		// Wait for both threads
		WaitForSingleObject(end_Semaphore1, INFINITE);
		WaitForSingleObject(end_Semaphore2, INFINITE);
		// Check if there was a simultaneous reorder
		if (r1 == 0 && r2 == 0)
		{
			detected++;
			printf("%d reorders detected after %d iterations\n", detected, iterations);
		}
	}
	system("pause");
	return 0;
}

执行结果：

r1 == 0 && r2 == 0 说明了在执行过程中内存指令进行了重排（Memory Reordering）。

参考：（Memory Reordering/Memory Model 及其对.NET的影响）

Memory Reordering它一定会有一个度 - 称为Memory Model - 来平衡两个极端:

A. 内存访问指令严格按照编程顺序执行, 即不排序. CPU不能从Memory Ordering获得任何好处. 但是程序会比较容易编写, 因为程序本身定义了内存访问的顺序.
B. 内存访问指令自由重新排序. CPU自由按照最大的效能原则重新排序内存访问顺序, CPU和总线效能得到最大发挥. 但是你根本无法为这样的CPU编写程序, 因为CPU不保证任何事情. 比如, 你写了这样一个程序, i = 1; i ++; 得到的i可能是0, 可能是1, 也可能是2。

Memory Model的具体实现在两者之间摇摆, 偏近极端A的实现, 我们称为强模型(strong model); 向极端B的方向靠拢(相对于前一种实现)的实现, 我们称其为弱模型(weak model). 。

在弱模型这种条件下，我们亟待需要一种操作，实现不能被再细分的操作，所以引入原子操作。

原子操作函数介绍

LONG __cdecl InterlockedIncrement(
  __inout  LONG volatile* Addend
);

LONG __cdecl InterlockedDecrement(
  __inout  LONG volatile* Addend
);

增加和减少的原子操作函数.

Addend 指向被修改的Long对象

返回值：执行原子操作修改之后的值。

LONG __cdecl InterlockedExchange(
  __inout  LONG volatile* Target,
  __in     LONG Value
);

PVOID __cdecl InterlockedExchangePointer(
  __inout  PVOID volatile* Target,
  __in     PVOID Value
);

将Value中的值，置换到Target对应的对象中。

Target 被修改的对象.

Value Target对象的新值.

返回值： Target对象修改之前的值。

LONG __cdecl InterlockedCompareExchange(
  __inout  LONG volatile* Destination,
  __in     LONG Exchange,
  __in     LONG Comparand
);

PVOID __cdecl InterlockedCompareExchangePointer(
  __inout  PVOID volatile* Destination,
  __in     PVOID Exchange,
  __in     PVOID Comparand
);

这俩函数可以理解为上个函数的加强版：假如Comparand == Destination ,则Destination = Exchange.
返回值: Destination对象修改之前的值

LONG __cdecl InterlockedAnd(
  __inout  LONG volatile* Destination,
  __in     LONG Value
);

LONG __cdecl InterlockedOr(
  __inout  LONG volatile* Destination,
  __in     LONG Value
);

LONG __cdecl InterlockedXor(
  __inout  LONG volatile* Destination,
  __in     LONG Value
);

与And,Or，Xor使用方法一致，返回值为Destination修改之前的值。

LONG __cdecl InterlockedDecrementAcquire(
  __inout  LONG volatile* Addend
);

LONG __cdecl InterlockedDecrementRelease(
  __inout  LONG volatile* Addend
);

InterlockedXXXAcquire 与 InterlockedXXXRelease 涉及到Aquire、Release sematics，将会在下面进行介绍。
这俩函数和对应的去掉后缀的操作符有什么不同点呢，读完下面介绍可能就会明白了。

Acquire and Release Semantics是什么

我MSDN的IterlockedXXXAquire中并没有什么对这个函数名的Aquire的解释，文档也写的云里雾里。

最后查了点资料大概懂了什么意思。

stackoverflow（InterlockedIncrement vs InterlockedIncrementAcquire vs InterlockedIncrementNoFence）

MSDN Acquire and Release Semantics

先看MSDN解释：

An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors willsee every preceding operation's effect before the effect of the operation itself.

对于acquire 语义告诉我们，后面的操作总是在acquire语义之后才生效，不会reorder到acquire前；release语义告诉我们，前面的操作总是在release语义前生效，不会到reorder到release后。

来看看preshing的解释Acquire and Release Semantics

Acquire semantics is a property which can only apply to operations which read from shared memory, whether they are read-modify-write operations or plain loads. The operation is then considered a read-acquire.Acquire semantics prevent memory reordering of the read-acquire with any read or write operation which follows it in program order.

Release semantics is a property which can only apply to operations which write to shared memory, whether they are read-modify-write operations or plain stores. The operation is then considered a write-release.Release semantics prevent memory reordering of the write-release with any read or write operation which precedes it in program order.

含有Acquire语义的读操作. 相当于一个单向向后的栅障. 普通的读和写操作可以向后越过该读操作, 但是之后的读和写操作不能向前越过该读操作.
含有Release语义的写操作. 相当于一个单向向前的栅障. 普通的读和写可以向前越过该写操作, 但是之前的读和写操作不能向后越过该写操作.

这两个语义加载一起，也就形成了memory barrier，在多核多线程中有效维护reorder的重排，

上面讲到的Interlocked operations，在不带xxxAcquire和xxxRelease的这类操作中，Acquire和Release语义同时存在，形成了全向屏障（任何读写操作都不能跨越该栅障）。

我们把上面reorder产生错误的代码改一改：

将X,Y操作加上Acquire语义，防止r1,r2的Load操作reorder到X,Y的Store前

unsigned int __stdcall Thread1(void* lpParameter)
{
	//InterlockedIncrement(&nCount);
	for (;;)                                 
	{
		WaitForSingleObject(beg_Semaphore1, INFINITE);
		while ((rand() % 1000) % 8 != 0) {}
		InterlockedExchangeAcquire(&X, 1);
//		X = 1;
		r1 = Y;
		ReleaseSemaphore(end_Semaphore1, 1, NULL);
	}
	return 0;
}

unsigned int __stdcall Thread2(void* lpParameter)
{
	//InterlockedIncrement(&nCount);
	for (;;)                                 
	{
		WaitForSingleObject(beg_Semaphore2, INFINITE);
		while ((rand() % 1000) % 8 != 0) {}
		InterlockedExchangeAcquire(&Y, 1);
//		Y = 1;
		r2 = X;
		ReleaseSemaphore(end_Semaphore2, 1, NULL);
	}
	return 0;
}

效果图：

这里不用上了，你当然看不到任何前面打印出的reorder操作检测的结果。

Interlocked operations 能保证我们在多核多线程中内存指令的正确执行，是一种相比于内核对象的一种更简单的同步方法。