python-multiprocessing 多进程并行计算

原创

已于 2025-01-21 13:58:10 修改 · 置顶 · 1.7w 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#python #并行计算 #multiproce

于 2017-08-13 14:59:33 首次发布

python的multiprocessing包是标准库提供的多进程并行计算包，提供了和threading（多线程）相似的API函数，但是相比于threading，将任务分配到不同的CPU，避免了GIL（Global Interpreter Lock）的限制。下面我们对multiprocessing中的Pool和Process类做介绍。

Pool

采用Pool进程池对任务并行处理更加方便，我们可以指定并行的CPU个数，然后 Pool 会自动把任务放到进程池中运行。 Pool 包含了多个并行函数。

apply apply_async

apply 要逐个执行任务，在python3中已经被弃用，而apply_async是apply的异步执行版本。并行计算一定要采用apply_async函数。


import multiprocessing
import time

from random import randint, seed

def f(num):
    seed()
    rand_num = randint(0,10) # 每次都随机生成一个停顿时间
    time.sleep(rand_num)
    return (num, rand_num)

start_time = time.time()
cores = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=cores)
pool_list = []
result_list = []
start_time = time.time()
for xx in xrange(10):
    pool_list.append(pool.apply_async(f, (xx, )))  # 这里不能 get， 会阻塞进程

result_list = [xx.get() for xx in pool_list]
#在这里不免有人要疑问，为什么不直接在 for 循环中直接 result.get()呢？这是因为pool.apply_async之后的语句都是阻塞执行的，调用 result.get() 会等待上一个任务执行完之后才会分配下一个任务。事实上，获取返回值的过程最好放在进程池回收之后进行，避免阻塞后面的语句。

# 最后我们使用一下语句回收进程池：   
pool.close()
pool.join()

print result_list
print '并行花费时间 %.2f' % (time.time() - start_time)
print '串行花费时间 %.2f' % (sum