I am writing a script to retrieve WMI info from many computers at the same time then write this info in a text file:
f = open(\"results.txt\", \'w+\') ## to
You can simply create your own locking mechanism to ensure that only one thread is ever writing to a file.
import threading
lock = threading.Lock()
def write_to_file(f, text, file_size):
lock.acquire() # thread blocks at this line until it can obtain lock
# in this section, only one thread can be present at a time.
print >> f, text, file_size
lock.release()
def filesize(asset):
f = open("results.txt", 'a+')
c = wmi.WMI(asset)
wql = 'SELECT FileSize,Name FROM CIM_DataFile where (Drive="D:" OR Drive="E:") and Caption like "%file%"'
for item in c.query(wql):
write_to_file(f, item.Name.split("\\")[2].strip().upper(), str(item.FileSize))
You may want to consider placing the lock around the entire for loop for item in c.query(wql):
to allow each thread to do a larger chunk of work before releasing the lock.
For another solution, use a Pool
to calculate data, returning it to the parent process. This parent then writes all data to a file. Since there's only one proc writing to the file at a time, there's no need for additional locking.
Note the following uses a pool of processes, not threads. This makes the code much simpler and easier than putting something together using the threading
module. (There is a ThreadPool
object, but it's not documented.)
import glob, os, time
from multiprocessing import Pool
def filesize(path):
time.sleep(0.1)
return (path, os.path.getsize(path))
paths = glob.glob('*.py')
pool = Pool() # default: proc per CPU
with open("results.txt", 'w+') as dataf:
for (apath, asize) in pool.imap_unordered(
filesize, paths,
):
print >>dataf, apath,asize
zwrap.py 122
usercustomize.py 38
tpending.py 2345
msimple4.py 385
parse2.py 499
print
is not thread safe. Use the logging module instead (which is):
import logging
import threading
import time
FORMAT = '[%(levelname)s] (%(threadName)-10s) %(message)s'
logging.basicConfig(level=logging.DEBUG,
format=FORMAT)
file_handler = logging.FileHandler('results.log')
file_handler.setFormatter(logging.Formatter(FORMAT))
logging.getLogger().addHandler(file_handler)
def worker():
logging.info('Starting')
time.sleep(2)
logging.info('Exiting')
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start()
t2.start()
Output (and contents of results.log
):
[INFO] (Thread-1 ) Starting
[INFO] (Thread-2 ) Starting
[INFO] (Thread-1 ) Exiting
[INFO] (Thread-2 ) Exiting
Instead of using the default name (Thread-n
), you can set your own name using the name
keyword argument, which the %(threadName)
formatting directive then will then use:
t = threading.Thread(name="My worker thread", target=worker)
(This example was adapted from an example from Doug Hellmann's excellent article about the threading module)