Python has a multiprocessing toolbox. This is a parallel processing library that relies on subprocesses, rather than threads.
crumb trail: > multiprocessing > Software and hardware
The multiprocessing toolbox does its own hardware detection; it uses a single node, and however many cores the system tells it there are.
## pool.py nprocs = mp.cpu_count() print(f"I detect {nprocs} cores")
crumb trail: > multiprocessing > Process
A process is an object that will execute a python function:
## quicksort.py import multiprocessing as mp import random import osdef quicksort( numbers ) : if len(numbers)==1: return numbers else: median = numbers[0] left = [ i for i in numbers if i<median ] right = [ i for i in numbers if i>=median ] with mp.Pool(2) as pool: [sortleft,sortright] = pool.map( quicksort,[left,right] ) return sortleft.append( sortright )
if __name__ == '__main__': numbers = [ random.randint(1,50) for i in range(32) ] process = mp.Process(target=quicksort,args=[numbers]) process.start() process.join()
Creating a process does not start it: for that use the start function. Execution of the process is not guaranteed until you call the join function on it:
if __name__ == '__main__': for p in processes: p.start() for p in processes: p.join()
By making the start and join calls less regular than in a loop like this, arbitrarily complicated code can be produced.
crumb trail: > multiprocessing > Process > Arguments
Arguments can be passed to the function of the process with the args keyword. This accepts a list (or tuple) of arguments, leading to a somewhat strange syntax for a single argument:
proc = Process(target=print_func, args=(name,))
crumb trail: > multiprocessing > Process > Process details
Note the test on __main__ : the processes started read the current file in order to execute the function specified. Without this clause, the import would first execute more process start calls, before getting to the function execution.
Processes have a name that you can retrieve as current_process().name . The default is Process-5 and such, but you can specify custom names:
Process(name="Your name here")The target function of a process can get hold of that process with the current_process function.
Of course you can also query os.getpid() but that does not offer any further possibilities.
def say_name(iproc): print(f"Process {os.getpid()} has name: {mp.current_process().name}") if __name__ == '__main__': processes = [ mp.Process(target=say_name,name=f"proc{iproc}",args=[iproc]) for iproc in range(6) ]
crumb trail: > multiprocessing > Pools and mapping
Often you want a number of processes to do apply to a number of arguments, for instance in a parameter sweep . For this, create a Pool object, and apply the map method to it:
pool = mp.Pool( nprocs ) results = pool.map( print_value,range(1,2*nprocs) )
Note that this is also the easiest way to get return values from a process, which is not directly possible with a Process object. Other approaches are using a shared object, or an object in a Queue or Pipe object; see below.
crumb trail: > multiprocessing > Shared data
The best way to deal with shared data is to create a Value or Array object, which come equipped with a lock for safe updating.
pi = mp.Value('d') pi.value = 0
For instance, one could stochastically calculate $\pi$ by
## pi.py def calc_pi1(pi,n): for i in range(n): x = random.random() y = random.random() with pi.get_lock(): if x*x+y*y<1: pi.value += 1.
Exercise
Do you see a way to improve the speed of this calculation?
End of exercise
crumb trail: > multiprocessing > Shared data > Pipes
A pipe , object type Pipe , corresponds to what used to be called a channel in older parallel programming systems: a FIFO object into which one process can place items, and from which another process can take them. However, a pipe is not associated with any particular pair: creating the pipe gives the entrace and exit from the pipe
q_entrance,q_exit = mp.Pipe()
And they can be passed to any process
producer1 = mp.Process(target=add_to_pipe,args=([1,q_entrance])) producer2 = mp.Process(target=add_to_pipe,args=([2,q_entrance])) printer = mp.Process(target=print_from_pipe,args=(q_exit,))
which can then can put and get items, using the send and recv commands.
## pipemulti.py def add_to_pipe(v,q): for i in range(10): print(f"put {v}") q.send(v) time.sleep(1) q.send("END")
def print_from_pipe(q): ends = 0 while True: v = q.recv() print(f"Got: {v}") if v=="END": ends += 1 if ends==2: break print("pipe is empty")
crumb trail: > multiprocessing > Shared data > Queues