Parallelizing methods in Rails

后端 未结 4 681
盖世英雄少女心
盖世英雄少女心 2021-02-07 06:42

My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:

def method_one
  batc         


        
相关标签:
4条回答
  • 2021-02-07 07:24

    Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.

    This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.

    So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.

    You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.

    def main_method
      Thread.new { method_one }
      Thread.new { method_two }
      Thread.new { method_three }
    end
    
    def method_one
      # something_slow_that_does_an_http_request
    end
    
    def method_two
      # something_slow_that_does_an_http_request
    end
    
    def method_three
      # something_slow_that_does_an_http_request
    end
    

    Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.

    Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.

    If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.

    It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.

    In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.

    There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.

    def task_1
    # Something slow
    return results
    end
    
    def task_2
    # Something slow
    return results
    end
    
    def task_3
    # Something slow
    return results
    end
    
    my_responses = {}
    Thread.new { my_responses[:result_1] = task_1 }
    Thread.new { my_responses[:result_2] = task_2 }
    Thread.new { my_responses[:result_3] = task_3 }
    
    while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
      sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
    end
    
    # Any code at this line will not execute until all three results are collected.
    

    Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.

    If you want to get into multithreaded programming, I recommend this book: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601

    It's centered around Java, but the pitfalls and concepts explained are universal.

    0 讨论(0)
  • 2021-02-07 07:28

    Ruby has the excellent promise gem. Your example would look like:

    require 'future'
    
    def method_one
    ...
    def method_nth
    
    def summary
      result1 = future { method_one }
      ......
      resultn = future { method_nth }
      collect_results result1, ..., resultn
    end
    

    Simple, isn't it? But let's get to more details. This is a future object:

    result1 = future { method_one }
    

    It means, the result1 is getting evaluated in the background. You can pass it around to other methods. But result1 doesn't have any result yet, it is still processing in the background. Think of passing around a Thread. But the major difference is - the moment you try to read it, instead of passing it around, it blocks and waits for the result at that point. So in the above example, all the result1 .. resultn variables will keep getting evaluated in the background, but when the time comes to collect the results, and when you try to actually read these values, the reads will wait for the queries to finish at that point.

    Install the promise gem and try the below in Ruby console:

    require 'future'
    x = future { sleep 20; puts 'x calculated'; 10 }; nil
    # adding a nil to the end so that x is not immediately tried to print in the console
    y = future { sleep 25; puts 'y calculated'; 20 }; nil
    
    # At this point, you'll still be using the console!
    # The sleeps are happening in the background
    
    # Now do:
    x + y
    # At this point, the program actually waits for the x & y future blocks to complete
    

    Edit: Typo in result, should have been result1, change echo to puts

    0 讨论(0)
  • 2021-02-07 07:35

    You can take a look at a new option in town: The futoroscope gem. As you can see by the announcing blog post it tries to solve the same problem you are facing, making simultaneous API query's. It seems to have pretty good support and good test coverage.

    0 讨论(0)
  • 2021-02-07 07:36

    You should check out Sidekiq.

    RailsCasts episode about Sidekiq.

    0 讨论(0)
提交回复
热议问题