What is the meaning of I/O intensive in Node.js

问题

I was learning Node.js and also found out that Node.js is best to be used with I/O intensive tasks which confused me a bit. So, after some research I found this statement: "An application that reads and/or writes a large amount of data". So, does it mean that Node.js is best to be used with data, that is, read big data, take necessary data from that and send back to client?

回答1:

A nodejs application can be architectured just fine to include non-I/O things and is not just suited for big data applications (in fact big data has nothing to do with it at all).

A default, simple implementation of Node.js performs best when your application is not CPU intensive and instead spends most of its time doing I/O (input/output) tasks such as reading/writing to a database, read/writing from files, reading/sending network data and so on. It's not about big data, it's about what does the server spend most of its time doing.

Surprisingly enough (to some) since a web server's primary job is responding to http requests which are usually requests for data, most web servers spend most of their time fetching things, reading and writing things and sending things which are all I/O tasks. In the node.js design, all these I/O tasks happen asynchronously in a non-blocking fashion and they use events to signal when those operations complete. This is where the phrase "event-driven design" comes from when describing node.js. It so happens that this makes node.js very efficient at handling things that involve primarily I/O. This is what a simple implementation of node.js does best. And, it generally does it better than a purely threaded server design that devotes an OS thread to every currently in-flight I/O operation (the original design for many server frameworks).

If you do have CPU intensive things (major calculations, image processing, heavy crypto operations, etc...) and you do them very often or they take very long, then you will be best served if you put those tasks in a Worker Thread or in another process and communicate back and forth between the main process in node.js and this worker to get that CPU-intensive work done. It used to be that node.js didn't have Worker Threads which made this task a little more complicated where you often had to use one or more additional processes (either via clustering or additional dedicated processes) in order to handle this CPU-intensive work, but now you can use Worker Threads which can be a bit more convenient.

For example, I have a server task that requires a very heavy amount of crypto (performing a billion crypto operations). If I put that in the main node.js thread, that essentially blocks the event loop so my server can't process other requests while that heavy duty crypto operation is running which would ruin the responsiveness of my server.

But, I was able to move the crypto work to a worker thread (actually to several worker threads) and then can crunch away on the crypto while my main thread stays nice and lively to handle other, unrelated incoming requests in a timely fashion.

回答2:

First of all, Big Data has nothing to do with Node.js.

I/O intensive means that the given task often waits for I/O. The best examples for these are file operations, networking.

If the processor has to regularly wait for data to arrive, the task is said to be I/O intensive.

Node.js's asynchronous nature however makes it really good at I/O intensive tasks, as it can keep doing other work while it waits for the data to arrive asynchronously.

For example, if you have 10 clients connected to the server and one of the clients requests for a data or task that is heavy to process, my server should not get stuck or wait until this task is finished as it will cause greater response time to other 9 clients or bad user experience. Rather, server should allow the other 9 clients to request data or task from the server, and when the respective tasks get finished, response should be sent back to clients.

PS: You can study about Event loop in Node.js

回答3:

What Node.js is great at is serving as the middle layer between clients and data sources, i.e. the inputs and outputs.

The reason Node.js is great at this is in the non-blocking event-driven approach it takes.

For example, when you make a request to a Node.js app that asks for some data from a database, Node.js will request that data and immediately return to other requests without being blocked by the database request.

Once the database sends the data back, Node.js triggers the callback (or resolves the promise) with that data and continues onwards.

There's no race condition between these input and output events because their synchronization is done in a single threaded mechanism called the Event Loop. Only one event gets processed at a time.

We can think of the Event Loop as a single-seat rollercoaster ride in an amusement park that has many lines of people waiting to go on the ride, one by one. When you get to go depends on when you got in a line, how important you are or if a friend saved you a spot but nevertheless only one person at a time will be able to partake.

This non-blocking event-driven approach allows Node.js to very efficiently react to input and output events and process many read/write operations because it's not really doing much processing, the CPU work is quite low. It's just serving as the middle layer between you and the data.

On the other hand, if these events lead to some intense CPU operations, Node.js used to perform quite poorly because the Event Loop can process only one event at a time.

To use the rollercoaster analogy from above, a CPU-intensive task would be as if one person is taking a really long ride while all others have to wait for them to be done.

Newer versions of Node.js did get some tools to allow it do to more than 1 thing at time (parallelism) by using workers. The trick here is that every pool of workers has its own Event Loop which allows applications to move the intense work into a different thread and run it in parallel with the rest of the application. Do note that this will only actually help if you run on a machine with more than 1 core. If your machine has 1 core, no matter what tool you use, you're gonna have a bad time because nothing can actually be done in parallel on a single core machine.

回答4:

In case of Intensive I/O tasks Majority of the time is spent waiting for network, filesystem and perhaps database I/O to complete. Increasing hard disk speed or network connection improves the overall performance.

In its most basic form Node.js is best suited for this type of computing. All I/O in Node.js is non-blocking and it allows other requests to be served while waiting for a particular read or write to complete.

来源：https://stackoverflow.com/questions/61421370/what-is-the-meaning-of-i-o-intensive-in-node-js

标签

node.js