How can I use threading in Python?

前端 未结 19 2720
迷失自我
迷失自我 2020-11-21 04:54

I am trying to understand threading in Python. I\'ve looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I\'m having trou

相关标签:
19条回答
  • 2020-11-21 05:21

    Here is multi threading with a simple example which will be helpful. You can run it and understand easily how multi threading is working in Python. I used a lock for preventing access to other threads until the previous threads finished their work. By the use of this line of code,

    tLock = threading.BoundedSemaphore(value=4)

    you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.

    import threading
    import time
    
    #tLock = threading.Lock()
    tLock = threading.BoundedSemaphore(value=4)
    def timer(name, delay, repeat):
        print  "\r\nTimer: ", name, " Started"
        tLock.acquire()
        print "\r\n", name, " has the acquired the lock"
        while repeat > 0:
            time.sleep(delay)
            print "\r\n", name, ": ", str(time.ctime(time.time()))
            repeat -= 1
    
        print "\r\n", name, " is releaseing the lock"
        tLock.release()
        print "\r\nTimer: ", name, " Completed"
    
    def Main():
        t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
        t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
        t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
        t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
        t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))
    
        t1.start()
        t2.start()
        t3.start()
        t4.start()
        t5.start()
    
        print "\r\nMain Complete"
    
    if __name__ == "__main__":
        Main()
    
    0 讨论(0)
  • 2020-11-21 05:22

    Like others mentioned, CPython can use threads only for I/O waits due to GIL.

    If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing:

    from multiprocessing import Process
    
    def f(name):
        print 'hello', name
    
    if __name__ == '__main__':
        p = Process(target=f, args=('bob',))
        p.start()
        p.join()
    
    0 讨论(0)
  • 2020-11-21 05:24

    I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound. Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million. I have used all four methods here:

    import math
    import timeit
    import threading
    import multiprocessing
    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    
    
    def time_stuff(fn):
        """
        Measure time of execution of a function
        """
        def wrapper(*args, **kwargs):
            t0 = timeit.default_timer()
            fn(*args, **kwargs)
            t1 = timeit.default_timer()
            print("{} seconds".format(t1 - t0))
        return wrapper
    
    def find_primes_in(nmin, nmax):
        """
        Compute a list of prime numbers between the given minimum and maximum arguments
        """
        primes = []
    
        # Loop from minimum to maximum
        for current in range(nmin, nmax + 1):
    
            # Take the square root of the current number
            sqrt_n = int(math.sqrt(current))
            found = False
    
            # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
            for number in range(2, sqrt_n + 1):
    
                # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
                if current % number == 0:
                    found = True
                    break
    
            # If not divisible, add this number to the list of primes that we have found so far
            if not found:
                primes.append(current)
    
        # I am merely printing the length of the array containing all the primes, but feel free to do what you want
        print(len(primes))
    
    @time_stuff
    def sequential_prime_finder(nmin, nmax):
        """
        Use the main process and main thread to compute everything in this case
        """
        find_primes_in(nmin, nmax)
    
    @time_stuff
    def threading_prime_finder(nmin, nmax):
        """
        If the minimum is 1000 and the maximum is 2000 and we have four workers,
        1000 - 1250 to worker 1
        1250 - 1500 to worker 2
        1500 - 1750 to worker 3
        1750 - 2000 to worker 4
        so let’s split the minimum and maximum values according to the number of workers
        """
        nrange = nmax - nmin
        threads = []
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
    
            # Start the thread with the minimum and maximum split up to compute
            # Parallel computation will not work here due to the GIL since this is a CPU-bound task
            t = threading.Thread(target = find_primes_in, args = (start, end))
            threads.append(t)
            t.start()
    
        # Don’t forget to wait for the threads to finish
        for t in threads:
            t.join()
    
    @time_stuff
    def processing_prime_finder(nmin, nmax):
        """
        Split the minimum, maximum interval similar to the threading method above, but use processes this time
        """
        nrange = nmax - nmin
        processes = []
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            p = multiprocessing.Process(target = find_primes_in, args = (start, end))
            processes.append(p)
            p.start()
    
        for p in processes:
            p.join()
    
    @time_stuff
    def thread_executor_prime_finder(nmin, nmax):
        """
        Split the min max interval similar to the threading method, but use a thread pool executor this time.
        This method is slightly faster than using pure threading as the pools manage threads more efficiently.
        This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
        """
        nrange = nmax - nmin
        with ThreadPoolExecutor(max_workers = 8) as e:
            for i in range(8):
                start = int(nmin + i * nrange/8)
                end = int(nmin + (i + 1) * nrange/8)
                e.submit(find_primes_in, start, end)
    
    @time_stuff
    def process_executor_prime_finder(nmin, nmax):
        """
        Split the min max interval similar to the threading method, but use the process pool executor.
        This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
        RECOMMENDED METHOD FOR CPU-BOUND TASKS
        """
        nrange = nmax - nmin
        with ProcessPoolExecutor(max_workers = 8) as e:
            for i in range(8):
                start = int(nmin + i * nrange/8)
                end = int(nmin + (i + 1) * nrange/8)
                e.submit(find_primes_in, start, end)
    
    def main():
        nmin = int(1e7)
        nmax = int(1.05e7)
        print("Sequential Prime Finder Starting")
        sequential_prime_finder(nmin, nmax)
        print("Threading Prime Finder Starting")
        threading_prime_finder(nmin, nmax)
        print("Processing Prime Finder Starting")
        processing_prime_finder(nmin, nmax)
        print("Thread Executor Prime Finder Starting")
        thread_executor_prime_finder(nmin, nmax)
        print("Process Executor Finder Starting")
        process_executor_prime_finder(nmin, nmax)
    
    main()
    

    Here are the results on my Mac OS X four-core machine

    Sequential Prime Finder Starting
    9.708213827005238 seconds
    Threading Prime Finder Starting
    9.81836523200036 seconds
    Processing Prime Finder Starting
    3.2467174359990167 seconds
    Thread Executor Prime Finder Starting
    10.228896902000997 seconds
    Process Executor Finder Starting
    2.656402041000547 seconds
    
    0 讨论(0)
  • 2020-11-21 05:26

    Most documentation and tutorials use Python's Threading and Queue module, and they could seem overwhelming for beginners.

    Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.

    Combined with with clause and list comprehension it could be a real charm.

    from concurrent.futures import ThreadPoolExecutor, as_completed
    
    def get_url(url):
        # Your actual program here. Using threading.Lock() if necessary
        return ""
    
    # List of URLs to fetch
    urls = ["url1", "url2"]
    
    with ThreadPoolExecutor(max_workers = 5) as executor:
    
        # Create threads
        futures = {executor.submit(get_url, url) for url in urls}
    
        # as_completed() gives you the threads once finished
        for f in as_completed(futures):
            # Get the results
            rs = f.result()
    
    0 讨论(0)
  • 2020-11-21 05:27

    Here's a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.

    import Queue
    import threading
    import urllib2
    
    # Called by each thread
    def get_url(q, url):
        q.put(urllib2.urlopen(url).read())
    
    theurls = ["http://google.com", "http://yahoo.com"]
    
    q = Queue.Queue()
    
    for u in theurls:
        t = threading.Thread(target=get_url, args = (q,u))
        t.daemon = True
        t.start()
    
    s = q.get()
    print s
    

    This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, to put its contents on the queue; each thread is a daemon (won't keep the process up if the main thread ends -- that's more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).

    Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.

    0 讨论(0)
  • 2020-11-21 05:30

    The answer from Alex Martelli helped me. However, here is a modified version that I thought was more useful (at least to me).

    Updated: works in both Python 2 and Python 3

    try:
        # For Python 3
        import queue
        from urllib.request import urlopen
    except:
        # For Python 2 
        import Queue as queue
        from urllib2 import urlopen
    
    import threading
    
    worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']
    
    # Load up a queue with your data. This will handle locking
    q = queue.Queue()
    for url in worker_data:
        q.put(url)
    
    # Define a worker function
    def worker(url_queue):
        queue_full = True
        while queue_full:
            try:
                # Get your data off the queue, and do some work
                url = url_queue.get(False)
                data = urlopen(url).read()
                print(len(data))
    
            except queue.Empty:
                queue_full = False
    
    # Create as many threads as you want
    thread_count = 5
    for i in range(thread_count):
        t = threading.Thread(target=worker, args = (q,))
        t.start()
    
    0 讨论(0)
提交回复
热议问题