Shuan Chen

PhD Student in KAIST CBE

0%

MultiProcessing in Python

The build-in python package makes the program run xN times faster with multiple CPUs

Pool

The class multiprocessing.Pool is an operator of conducting multi-process jobs
The given task are splitted in the Pool object and multi-processed
The jobs are splitted into n process in parallel and gethered back after the jobs are done by calling Pool(proccesses = n)

map function

With a little modified from official documentation, the example is given

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import time
from multiprocessing import Pool

def task(x):
time.sleep(1)
return x

start_time = time.time()
for x in range(10):
square(x)
print('Single-process takes {} seconds'.format(time.time() - start_time))

start_time = time.time()
with Pool(processes = 8) as pool:
pol.map(square, range(10))
print('Multi-process takes {} seconds'.format(time.time() - start_time))

and it gives
1
2
Single-process takes 10.010777711868286 seconds
Multi-process takes 2.13849139213562 seconds

Get the return value from function

You can get the returned value from the Pool.map function

1
2
3
4
5
start_time = time.time()
with Pool(processes = 8) as pool:
results = pool.map(square, range(64))
print (results)
print('Pool.map takes %s seconds' %(time.time() - start_time))

and it gives

1
2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
Pool.map takes 8.063436031341553 seconds

or you can use pool.imap for iterative operation with tqdm you can see the progress of multi-processing

1
2
3
4
5
6
7
8
9
from tqdm import tqdm

start_time = time.time()
results = []
with Pool(processes = 8) as pool:
for result in tqdm(pool.imap(square, range(64)), total=64, desc = 'Multi-processing'):
results.append(result)
print (results)
print('Pool.imap takes %s seconds' %(time.time() - start_time))

and it gives

1
2
3
Multi-processing: 100%|██████████| 64/64 [00:08<00:00,  7.99it/s]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
Pool.imap takes 8.06504487991333 seconds

More than one arguement as input

You may notice the function in the previous example only takes one input, which is usually a number x.
However, sometimes you may want to put more inputs in the function, then you need to use the function partial from functools.
Here is an example of using partial and pprint:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from functools import partial
from pprint import pprint

def ChineseZodiac(x, zodiac_dict):
x += 1987
return (x, zodiac_dict[(x-4)%len(zodiac_dict)])

animals = ['鼠', '牛', '虎', '兔', '龍', '蛇', '馬', '羊', '猴', '雞', '狗', '豬']
animals = {number:chinese for number, chinese in enumerate(animals)}
partial_func = partial(ChineseZodiac, zodiac_dict = animals)
results = []
with Pool(processes = 8) as pool:
for result in tqdm(pool.imap(partial_func, range(24)), total=24, desc = 'Multi-processing'):
results.append(result)
pprint (results)

Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Multi-processing: 100%|██████████| 24/24 [00:00<00:00, 23082.62it/s]
[(1987, '兔'),
(1988, '龍'),
(1989, '蛇'),
(1990, '馬'),
(1991, '羊'),
(1992, '猴'),
(1993, '雞'),
(1994, '狗'),
(1995, '豬'),
(1996, '鼠'),
(1997, '牛'),
(1998, '虎'),
(1999, '兔'),
(2000, '龍'),
(2001, '蛇'),
(2002, '馬'),
(2003, '羊'),
(2004, '猴'),
(2005, '雞'),
(2006, '狗'),
(2007, '豬'),
(2008, '鼠'),
(2009, '牛'),
(2010, '虎')]