I am trying to run a simple command that guesses gender by name using multiprocessing. This code worked on a previous machine so perhaps my setup had something to do with it
I got multiprocessing to work from within a Jupyter notebook on Windows by saving my function in a separate .py file and including that file in my notebook.
Example:
f.py:
def f(name, output):
output.put('hello {0}'.format(name))
return
Code in Jupyter notebook:
from multiprocessing import Process, Queue
#Having the function definition here results in
#AttributeError: Can't get attribute 'f' on <module '__main__' (built-in)>
#The solution seems to be importing the function from a separate file.
import f
#Also, the original version of f only had a print statement in it.
#That doesn't work with Process - in the sense that it prints to the console
#instead of the notebook.
#The trick is to let f write the string to print into an output-queue.
#When Process is done, the result is retrieved from the queue and printed.
if __name__ == '__main__':
# Define an output queue
output=Queue()
# Setup a list of processes that we want to run
p = Process(target=f.f, args=('Bob',output))
# Run process
p.start()
# Exit the completed process
p.join()
# Get process results from the output queue
result = output.get(p)
print(result)
I'm a Python newby and I may have missed all sorts of details, but this works for me.
This problem would be headache for people using Jupyter on windows. The code would run fine on Linux system.
In order to run the code on windows,
After much research it appears that multiprocessing is not an option to use in a notebook on windows. I am closing but please open if you have a solution. I will switch over to pathos.
How about this:
Code:
#!/usr/bin/env python3
import sys
import time
import gender_guesser.detector as gender
import pandas as pd
import multiprocessing as mp
d = gender.Detector()
def guess_gender(name):
n = name.title()
g = d.get_gender(n)
return g
def run():
ls = ['john','joe','amamda','derick','peter','ashley','john',\
'joe','amamda','derick','peter','ashley']
num_cpus = mp.cpu_count() - 1
pool = mp.Pool(processes=num_cpus)
result = pool.map(guess_gender, ls)
df = pd.DataFrame(result, columns=["gender"])
print("\ntook {} secs to classify\n".format(str(time.time() - st)))
print(df) # or you could save the dataframe using .to_csv()
st = time.time()
if __name__ == "__main__":
run()
Output:
took 0.0150408744812 secs to classify
gender
0 male
1 male
2 unknown
3 male
4 male
5 mostly_female
6 male
7 male
8 unknown
9 male
10 male
11 mostly_female