Parallelizing methods in Rails

后端 未结 4 688
盖世英雄少女心
盖世英雄少女心 2021-02-07 06:42

My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:

def method_one
  batc         


        
4条回答
  •  心在旅途
    2021-02-07 07:24

    Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.

    This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.

    So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.

    You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.

    def main_method
      Thread.new { method_one }
      Thread.new { method_two }
      Thread.new { method_three }
    end
    
    def method_one
      # something_slow_that_does_an_http_request
    end
    
    def method_two
      # something_slow_that_does_an_http_request
    end
    
    def method_three
      # something_slow_that_does_an_http_request
    end
    

    Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.

    Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.

    If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.

    It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.

    In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.

    There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.

    def task_1
    # Something slow
    return results
    end
    
    def task_2
    # Something slow
    return results
    end
    
    def task_3
    # Something slow
    return results
    end
    
    my_responses = {}
    Thread.new { my_responses[:result_1] = task_1 }
    Thread.new { my_responses[:result_2] = task_2 }
    Thread.new { my_responses[:result_3] = task_3 }
    
    while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
      sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
    end
    
    # Any code at this line will not execute until all three results are collected.
    

    Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.

    If you want to get into multithreaded programming, I recommend this book: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601

    It's centered around Java, but the pitfalls and concepts explained are universal.

提交回复
热议问题