Parallelizing tasks in Node.js

后端 未结 5 1435
执念已碎
执念已碎 2020-12-13 02:48

I have some tasks I want to do in JS that are resource intensive. For this question, lets assume they are some heavy calculations, rather then system access. Now I want to r

相关标签:
5条回答
  • 2020-12-13 03:16

    Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features. http://adambom.github.io/parallel.js/

    0 讨论(0)
  • 2020-12-13 03:23

    How do I make this be actually parallel?

    First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.

    That said, you can get processing parallelism on multicore machine via forking the code into separate node processes or by spawning child process. This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (e.g. stdout, process fork IPC mechanism). Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC.

    What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?

    It is not starting a new process. Underneath, when async.parallel is used in node.js, it is using process.nextTick(). And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc.

    Long story short

    Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Multiple threads cannot share data/memory, therefore locks aren't needed. Node is lock free. One node process leverages one thread, and this makes node both safe and powerful.

    When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers. e.g. IPC/RPC.


    For more see:

    Awesome answer from SO on What is Node.js... with tons of goodness.

    Understanding process.nextTick()

    0 讨论(0)
  • 2020-12-13 03:26

    Keep in mind I/O is parallelized by Node.js; only your JavaScript callbacks are single threaded.

    Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. Coordinate incoming requests using a reverse proxy or load balancer.

    You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop.

    To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. The speed up from C++ might negate the need of parallelization anyway.

    You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and e.g. expose them through a REST API.

    Finally, you could perhaps run the code on the GPU using node-cuda or something similar depending on the type of computation (not all can be optimized for GPU).

    Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether.

    0 讨论(0)
  • 2020-12-13 03:28

    Asynchronous and parallel are not the same thing. Asynchronous means that you don't have to wait for synchronization. Parallel means that you can be doing multiple things at the same time. Node.js is only asynchronous, but its only ever 1 thread. It can only work on 1 thing at once. If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results.

    To do this you could use child_process.spawn and then read data from stdin.

    http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options

    var spawn = require('child_process').spawn;
    var process2 = spawn('sh', ['./computationProgram', 'parameter'] );
    
    process2.stderr.on('data', function (data) {
        //handle error input
    });
    
    process2.stdout.on('data', function (data) {
        //handle data results
    });
    
    0 讨论(0)
  • 2020-12-13 03:29

    Depending on your use case you can use something like

    task.js Simplified interface for getting CPU intensive code to run on all cores (node.js, and web)

    A example would be

    function blocking (exampleArgument) {
        // block thread
    }
    
    // turn blocking pure function into a worker task
    const blockingAsync = task.wrap(blocking);
    
    // run task on a autoscaling worker pool
    blockingAsync('exampleArgumentValue').then(result => {
        // do something with result
    });
    
    0 讨论(0)
提交回复
热议问题