（第35章）Linux系统编程之线程_linux线程相关论文博客-CSDN博客

文章目录

一、线程的概念
二、线程控制
三、线程间的同步

一、线程的概念

（1）线程和进程的区别

进程在各自独立的地址空间中运行，进程之间共享数据需要用 mmap 或者进程间通信机制
本节我们学习如何在一个进程的地址空间中执行多个线程。
有些情况需要在一个进程中同时执行多个控制流程，这时候线程就派上了用场，比如实现一个图形界面的下载软件，一方面需要和用户交互，等待和处理用户的鼠标键盘事件，另一方面又需要同时下载多个文件，等待和处理从多个网络主机发来的数据，这些任务都需要一个“等待-处理”的循环，可以用多线程实现，一个线程专门负责与用户交互，另外几个线程每个线程负责和一个网络主机通信。

（2）多线程的控制流程和信号处理函数控制流程的区别

main 函数和信号处理函数是同一个进程地址空间中的多个控制流程，多线程也是如此，但是比信号处理函数更加灵活，
信号处理函数的控制流程只是在信号递达时产生，在处理完信号之后就结束，而多线程的控制流程可以长期并存，操作系统会在各线程之间调度和切换，就像在多个进程之间调度和切换一样

（3）线程共享的进程资源和独享的资源

（a）线程共享的资源

由于同一进程的多个线程共享同一地址空间，因此Text Segment、Data Segment都是共享的，如果定义一个函数，在各线程中都可以调用，如果定义一个全局变量，在各线程中都可以访问到
除此之外，各线程还共享以下进程资源和环境：

（b）线程独享的资源

二、线程控制

1.创建线程的相关函数

在这里插入图片描述
（1）返回值：成功返回0，失败返回错误号

以前学过的系统函数都是成功返回0，失败返回-1，
而错误号保存在全局变量 errno 中，而pthread库的函数都是通过返回值返回错误号， 虽然每个线程也都有一个 errno ，但这是为了兼容其它函数接口而提供的，pthread库本身并不使用它，通过返回值返回错误码更加清晰。

（2）函数指针 start_routine

在一个线程中调用pthread_create()创建新的线程后，当前线程从pthread_create()返回继续往下执行，而新的线程所执行的代码由我们传给 pthread_create 的函数指针start_routine 决定
start_routine 函数接收一个参数，是通过 pthread_create 的 arg 参数传递给它的，该参数的类型为 void * ，这个指针按什么类型解释由调用者自己定义。 start_routine 的返回
值类型也是 void * ，这个指针的含义同样由调用者自己定义。
start_routine 返回时，这个线程就退出了。
其它线程可以调用 pthread_join 得到 start_routine 的返回值，类似于父进程调用 wait(2) 得到子进程的退出状态

（3）pthread_create 成功返回后，新创建的线程的id被填写到 thread 参数所指向的内存单元

我们知道进程id的类型是 pid_t ，每个进程的id在整个系统中是唯一的，调用 getpid(2) 可以获得当前进程的id，是一个正整数值。
线程id的类型是 thread_t ，它只在当前进程中保证是唯一的，在不同的系统中 thread_t 这个类型有不同的实现，它可能是一个整数值，也可能是一个结构体，也可能是一个地址，所以不能简单地当成整数用 printf 打印，调用 pthread_self(3) 可以获得当前线程的id

（4）attr 参数表示线程属性，本章不深入讨论线程属性，所有代码例子都传 NULL 给 attr 参
数，表示线程属性取缺省值

2.eg

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
pthread_t ntid;
void printids(const char *s)
{
	pid_t pid;
	pthread_t tid;
	pid = getpid();
	tid = pthread_self();
	printf("%s pid %u tid %u (0x%x)\n", s, (unsigned int)pid,
			(unsigned int)tid, (unsigned int)tid);
} 
void *thr_fn(void *arg)
{
	printids(arg);
	return NULL;
}
 int main(void)
{
	int err;
	err = pthread_create(&ntid, NULL, thr_fn, "new thread: ");
	if (err != 0) {
		fprintf(stderr, "can't create thread: %s\n", strerror(err));
		exit(1);
	}
	printids("main thread:");
	sleep(1);
	
	return 0;
}

解释说明：
在这里插入图片描述

思考题：主线程在一个全局变量 ntid 中保存了新创建的线程的id，如果新创建的线程不调
用 pthread_self 而是直接打印这个 ntid ，能不能达到同样的效果？
结果如下：
main thread: pid 22184 tid 1379190528 (0x5234c700)
new thread: pid 22184 tid 1379190528 (0x5234c700)

2.终止线程

（1）只终止某个线程而不终止整个进程的三个方法

从线程函数 return 。这种方法对主线程不适用，从 main 函数 return 相当于调用 exit
一个线程可以调用 pthread_cancel 终止同一进程中的另一个线程
线程可以调用 pthread_exit 终止自己

（2） pthread_exit

在这里插入图片描述

（3）pthread_join

在这里插入图片描述

（4）eg

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

void *thr_fn1(void *arg)
{
	printf("thread 1 returning\n");
	return (void *)1;
} 
void *thr_fn2(void *arg)
{
	printf("thread 2 exiting\n");
	pthread_exit((void *)2);
} 
void *thr_fn3(void *arg)
{
	while(1) {
		printf("thread 3 writing\n");
		sleep(1);
	}
} 
int main(void)
{
	pthread_t tid;
	void *tret;
	pthread_create(&tid, NULL, thr_fn1, NULL);
	pthread_join(tid, &tret);
	printf("thread 1 exit code %d\n", (int)tret);
	
	pthread_create(&tid, NULL, thr_fn2, NULL);
	pthread_join(tid, &tret);
	printf("thread 2 exit code %d\n", (int)tret);
	
	pthread_create(&tid, NULL, thr_fn3, NULL);
	sleep(3);
	pthread_cancel(tid);
	pthread_join(tid, &tret);
	printf("thread 3 exit code %d\n", (int)tret);
	return 0;
}

结果：
在这里插入图片描述
解释：

一般情况下，线程终止后，其终止状态一直保留到其它线程调用 pthread_join 获取它的状态为止
但是线程也可以被置为detach状态，这样的线程一旦终止就立刻回收它占用的所有资源，而不保留终止状态
但是线程也可以被置为detach状态，这样的线程一旦终止就立刻回收它占用的所有资源，而不保留终止状态。 不能对一个已经处于detach状态的线程调用 pthread_join ，这样的调用将返回 EINVAL
对一个尚未detach的线程调用 pthread_join 或 pthread_detach 都可以把该线程置为detach状态，也就是说，不能对同一线程调用两次 pthread_join ， 或者如果已经对一个线程调用了 pthread_detach 就不能再调用 pthread_join 了

三、线程间的同步

1.mutex

（1）多个线程同时访问共享数据时可能会冲突，这跟前面讲信号时所说的可重入性是同样的问题

eg：比如两个线程都要把某个全局变量增加1，这个操作在某平台需要三条指令完成：
在这里插入图片描述

假设两个线程在多处理器平台上同时执行这三条指令，则可能导致下图所示的结果，最后变量只加了一次而非两次。
图 35.1. 并行访问冲突
思考一下，如果这两个线程在单处理器平台上执行，能够避免这样的问题吗？

eg：我们在一个循环中重复上述操作几千次，就会观察到访问冲突的现象。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NLOOP 5000
int counter; /* incremented by threads */
void *doit(void *);

int main(int argc, char **argv)
{
	pthread_t tidA, tidB;
	pthread_create(&tidA, NULL, &doit, NULL);
	pthread_create(&tidB, NULL, &doit, NULL);
	/* wait for both threads to terminate */
	pthread_join(tidA, NULL);
	pthread_join(tidB, NULL);
	return 0;
} 
void *doit(void *vptr)
{
	int i, val;
	/*
	* Each thread fetches, prints, and increments the counter NLOOP times.
	* The value of the counter should increase monotonically.
	*/
	for (i = 0; i < NLOOP; i++) {
		val = counter;
		printf("%x: %d\n", (unsigned int)pthread_self(), val + 1);
		counter = val + 1;
	} 

	return NULL;
}

我们创建两个线程，各自把 counter 增加5000次，正常情况下最后 counter 应该等于10000，但事实上每次运行该程序的结果都不一样，有时候数到5000多，有时候数到6000多。

（2）互斥锁解决访问冲突问题

在这里插入图片描述
（a）Mutex用 pthread_mutex_t 类型的变量表示，可以这样初始化和销毁：

返回值：成功返回0，失败返回错误号。
pthread_mutex_init 函数对Mutex做初始化，参数 attr 设定Mutex的属性，如果 attr 为 NULL 则表示缺省属性
用 pthread_mutex_init 函数初始化的Mutex可以用 pthread_mutex_destroy 销毁。
如果Mutex变量是静态分配的（全局变量或 static 变量） ，也可以用宏定义 PTHREAD_MUTEX_INITIALIZER 来初始化，相当于用 pthread_mutex_init 初始化并且 attr 参数为 NULL 。

（b）Mutex的加锁和解锁操作可以用下列函数
在这里插入图片描述

返回值：成功返回0，失败返回错误号。
一个线程可以调用pthread_mutex_lock获得Mutex， 如果这时另一个线程已经调用pthread_mutex_lock获得了该Mutex，则当前线程需要挂起等待，直到另一个线程调用pthread_mutex_unlock释放Mutex，当前线程被唤醒，才能获得该Mutex并继续执行
如果一个线程既想获得锁，又不想挂起等待，可以调用pthread_mutex_trylock，如果Mutex已经被另一个线程获得，这个函数会失败返回EBUSY，而不会使线程挂起等待
现在我们用Mutex解决先前的问题，代码如下

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NLOOP 5000
int counter; /* incremented by threads */
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
void *doit(void *);

int main(int argc, char **argv)
{
	pthread_t tidA, tidB;
	pthread_create(&tidA, NULL, doit, NULL);
	pthread_create(&tidB, NULL, doit, NULL);
	/* wait for both threads to terminate */
	pthread_join(tidA, NULL);
	pthread_join(tidB, NULL);
	return 0;
} 

void *doit(void *vptr)
{
	int i, val;
	/*
	* Each thread fetches, prints, and increments the counter NLOOP times.
	* The value of the counter should increase monotonically.
	*/
	for (i = 0; i < NLOOP; i++) {
		pthread_mutex_lock(&counter_mutex);
		
		val = counter;
		printf("%x: %d\n", (unsigned int)pthread_self(), val + 1);
		counter = val + 1;
		
		pthread_mutex_unlock(&counter_mutex);
	} 

	return NULL;
}

这样运行结果就正常了，每次运行都能数到10000

看到这里，读者一定会好奇： Mutex的两个基本操作lock和unlock是如何实现的呢？
假设Mutex变量的值为1表示互斥锁空闲，这时某个进程调用lock可以获得锁，而Mutex的值为0表示互斥锁已经被某个线程获得，其它线程再调用lock只能挂起等待。那么lock和unlock的伪代码如下：
unlock操作中唤醒等待线程的步骤可以有不同的实现， 可以只唤醒一个等待线程，也可以唤醒所有等待该Mutex的线程
然后让被唤醒的这些线程去竞争获得这个Mutex，竞争失败的线程继续挂起等待
细心的读者应该已经看出问题了：对Mutex变量的读取、判断和修改不是原子操作。
如果两个线程同时调用lock，这时Mutex是1，两个线程都判断mutex>0成立，然后其中一个线程置mutex=0，而另一个线程并不知道这一情况，也置mutex=0，于是两个线程都以为自己获得了锁。
“挂起等待”和“唤醒等待线程”的操作如何实现？
每个Mutex有一个等待队列，一个线程要在Mutex上挂起等待， 首先在把自己加入等待队列中，然后置线程状态为睡眠，然后调用调度器函数切换到别的线程。
一个线程要唤醒等待队列中的其它线程， 只需从等待队列中取出一项，把它的状态从睡眠改为就绪，加入就绪队列，那么下次调度器函数执行时就有可能切换到被唤醒的线程

（c）死锁

一般情况下， 如果同一个线程先后两次调用lock，在第二次调用时，由于锁已经被占用，该线程会挂起等待别的线程释放锁，然而锁正是被自己占用着的，该线程又被挂起而没有机会释放锁， 因此就永远处于挂起等待状态了，这叫做死锁（ Deadlock）
另一种典型的死锁情形是这样：
线程A获得了锁1，线程B获得了锁2，这时线程A调用lock试图获得锁2，结果是需要挂起等待线程B释放锁2，而这时线程B也调用lock试图获得锁1，结果是需要挂起等待线程A释放锁1，于是线程A和B都永远处于挂起状态了
不难想象，如果涉及到更多的线程和更多的锁，有没有可能死锁的问题将会变得复杂和难以判断
写程序时应该尽量避免同时获得多个锁，
如果一定有必要这么做，则有一个原则：
（i）如果所有线程在需要多个锁时都按相同的先后顺序（ 常见的是按Mutex变量的地址顺序）获得锁，则不会出现死锁。
（ii）比如一个程序中用到锁1、锁2、锁3，它们所对应的Mutex变量的地址是锁1<锁
2<锁3，那么所有线程在需要同时获得2个或3个锁时都应该按锁1、锁2、锁3的顺序获得。
（iii）如果要为所有的锁确定一个先后顺序比较困难，则应该尽量使用pthread_mutex_trylock调用代替pthread_mutex_lock调用，以免死锁

2.条件变量Condition Variable

（1）pthread_cond_destroy，pthread_cond_init，pthread_cond_t

线程间的同步还有这样一种情况：线程A需要等某个条件成立才能继续往下执行，现在这个条
件不成立，线程A就阻塞等待，而线程B在执行过程中使这个条件成立了，就唤醒线程A继续
执行。
在pthread库中通过条件变量（ Condition Variable）来阻塞等待一个条件，或者唤醒等待这个条件的线程。 Condition Variable用 pthread_cond_t 类型的变量表示，可以这样初始化和销毁：
返回值：成功返回0，失败返回错误号
和Mutex的初始化和销毁类似， pthread_cond_init 函数初始化一个Condition Variable， attr 参数为 NULL 则表示缺省属性
pthread_cond_destroy 函数销毁一个Condition Variable
如果Condition Variable是静态分配的，也可以用宏定义 PTHEAD_COND_INITIALIZER 初始化，相当于用 pthread_cond_init 函数初始化并且 attr 参数为 NULL 。

（2）条件变量的操作函数

Condition Variable的操作可以用下列函数：
返回值：成功返回0，失败返回错误号。
可见，一个Condition Variable总是和一个Mutex搭配使用的。
一个线程可以调用 pthread_cond_wait 在一个Condition Variable上阻塞等待，这个函数做以下三步操作：
pthread_cond_timedwait 函数还有一个额外的参数可以设定等待超时，如果到达了 abstime 所指定的时刻仍然没有别的线程来唤醒当前线程，就返回 ETIMEDOUT 。
一个线程可以调用 pthread_cond_signal 唤醒在某个Condition Variable上等待的另一个线程，也可以调用 pthread_cond_broadcast 唤醒在这个Condition Variable上等待的所有线程。
下面的程序演示了一个生产者-消费者的例子，生产者生产一个结构体串在链表的表头上，消
费者从表头取走结构体。

#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
struct msg {
	struct msg *next;
	int num;
};

struct msg *head;
pthread_cond_t has_product = PTHREAD_COND_INITIALIZER;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *consumer(void *p)
{
	struct msg *mp;
	for (;;) {
			pthread_mutex_lock(&lock);
			while (head == NULL)
			pthread_cond_wait(&has_product, &lock);
			mp = head;
			head = mp->next;
			pthread_mutex_unlock(&lock);
			printf("Consume %d\n", mp->num);
			free(mp);
			sleep(rand() % 5);
	}
} 

void *producer(void *p)
{
	struct msg *mp;
	for (;;) {
			mp = malloc(sizeof(struct msg));
			mp->num = rand() % 1000 + 1;
			printf("Produce %d\n", mp->num);
			pthread_mutex_lock(&lock);
			mp->next = head;
			head = mp;
			pthread_mutex_unlock(&lock);
			pthread_cond_signal(&has_product);
			sleep(rand() % 5);
		}
} 

int main(int argc, char *argv[])
{
	pthread_t pid, cid;
	
	srand(time(NULL));
	pthread_create(&pid, NULL, producer, NULL);
	pthread_create(&cid, NULL, consumer, NULL);
	pthread_join(pid, NULL);
	pthread_join(cid, NULL);
	return 0;
}

结果：
在这里插入图片描述

3.Semaphore

（1）Mutex与信号量的区别

Mutex变量是非0即1的，可看作一种资源的可用数量， 初始化时Mutex是1，表示有一个可用资源，加锁时获得该资源，将Mutex减到0，表示不再有可用资源，解锁时释放该资源，将Mutex重新加到1，表示又有了一个可用资源
信号量（ Semaphore）和Mutex类似，表示可用资源的数量，和Mutex不同的是这个数量可以大于1

（2）信号量不仅可用于同一进程的线程间同步，也可用于不同进程间的同步

在这里插入图片描述

semaphore变量的类型为sem_t
sem_init()初始化一个semaphore变量， value参数表示可用资源的数量， pshared参数为0表示信号量用于同一进程的线程间同步
在用完semaphore变量之后应该调用sem_destroy()释放与semaphore相关的资源
调用sem_wait()可以获得资源，使semaphore的值减1，如果调用sem_wait()时semaphore的值已经是0，则挂起等待。如果不希望挂起等待，可以调用sem_trywait()
调用sem_post()可以释放资源，使semaphore的值加1，同时唤醒挂起等待的线程
eg：上一节生产者－消费者的例子是基于链表的，其空间可以动态分配，现在基于固定大小的环形队列重写这个程序

#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#define NUM 5
int queue[NUM];
sem_t blank_number, product_number;

void *producer(void *arg)
{
	int p = 0;
	while (1) {
		sem_wait(&blank_number);
		queue[p] = rand() % 1000 + 1;
		printf("Produce %d\n", queue[p]);
		sem_post(&product_number);
		p = (p+1)%NUM;
		sleep(rand()%5);
	}
}

 void *consumer(void *arg)
{
	int c = 0;
	while (1) {
		sem_wait(&product_number);
		printf("Consume %d\n", queue[c]);
		queue[c] = 0;
		sem_post(&blank_number);
		c = (c+1)%NUM;
		sleep(rand()%5);
	}
}

 int main(int argc, char *argv[])
{
	pthread_t pid, cid;
	sem_init(&blank_number, 0, NUM);
	sem_init(&product_number, 0, 0);
	
	pthread_create(&pid, NULL, producer, NULL);
	pthread_create(&cid, NULL, consumer, NULL);
	pthread_join(pid, NULL);
	pthread_join(cid, NULL);
	sem_destroy(&blank_number);
	sem_destroy(&product_number);
	
	return 0;
}

4.其它线程间同步机制：读写锁

如果共享数据是只读的，那么各线程读到的数据应该总是一致的，不会出现访问冲突。只要有一个线程可以改写数据，就必须考虑线程间同步的问题。由此引出了读者写者锁（ReaderWriterLock）的概念， Reader之间并不互斥，可以同时读共享数据，而Writer是独占的 exclusive），在Writer修改数据时其它Reader或Writer不能访问数据，可见Reader-Writer Lock比Mutex具有更好的并发性
用挂起等待的方式解决访问冲突不见得是最好的办法，因为这样毕竟会影响系统的并发性，
在某些情况下解决访问冲突的问题可以尽量避免挂起某个线程，例如Linux内核的Seqlock、
RCU（ read-copy-update）等机制。