Is the Node.js I/O event loop single- or multithreaded?
If I have several I/O processes, node puts them in an external event loop. Are they processed in a sequence
actually node js core implementation using two components :
v8 javascript runtime engine
libuv for handlign non i/o blocking operation and handling threads and concurrent operations for you;
with the javascript you can actually write code with one thread but this means not that your code execute on the one thread although you can execute on multiple thread s using clusters in node js
let fs = require('fs');
fs.stat('path',(err,stat)=>{
//do something with the stat;
console.log('second');
});
console.log('first');
Well, to understand nodejs I/O events in the event, you must understand nodejs event loop properly.
from the name event loop, we understand it's a loop that runs cycle after cycle round-robin basis until there are no events remains in the loop or the app closed.
The event loop is one of the topmost features in nodejs, it is what makes async programming in nodejs.
When the program starts we are in a node process in the single thread where the event loop runs. Now the most importing things we need to know that the event loop is where all the application code that is inside callback functions is executed.
So, basically all code that is not top-level code will run in the event loop. Some part (mostly heavy duties) might get offloaded to the thread pool (When is the thread pool used?), the event loop will take care of those heavy duties and return the result to the event of the event loop.
It is the heart of the node architecture, and nodejs built around callback functions. so callbacks will triggered as soon as some work is finished sometime in the future because node uses an event-triggered architecture.
When an application receives an HTTP request on a node server or a timer expiring or a file finishing to read all these will emit events as soon as they are done with their work, and our event loop will then pick up these events and call the callback functions that are associated with each event, it's usually said that the event loop does the orchestration, which simply means that it receives events, calls their callback functions, and offloads the more expensive tasks to the thread pool. Now, how does all this actually work behind the scenes? In what order are these callbacks executed?
Well, when we start our node application, the event loop starts running right away. An event loop has multiple phases, and each phase has a callback queue, where the four most important phases are 1. Expired timer callbacks, 2.I/O polling and callbacks 3. setImmediate callbacks, and 4. Close callbacks. There are other phases that is used internally by Node.
So, the first phase takes care of callbacks of expired timers, for example, from the setTimeout() function. So, if there are callback functions from timers that just expired, these are the first ones to be processed by the event loop.
** The most important thing is, If a timer expires later during the time when one of the other phases is being processed, well then the callback of that timer will only be called as soon as the event loop comes back to this first phase. And it works like this in all four phases.**
So callbacks in each queue are processed one by one until there are no ones left in the queue and only then, the event loop will enter the next phase. for example, suppose there is 1000 setTimeOut callbacks timer expired and the event loop is in the first phase then all these 1000 setTimeOuts callbacks will execute one by one then it will go to the next phase(I/O pooling and callbacks).
Next up, we have I/O pooling and execution of I/O callbacks. Here I/O stands for input/output and polling basically means looking for new I/O events that are ready to be processed and putting theme into the callback queue.
In the context of a Node application, I/O means mainly stuff like networking and file access, so in this phase where probably 99% of general application code gets executed.
The next phase is for setImmediate callbacks, and SetImmediate is a special kind of timer that we can use if we want to process callbacks immediately after the I/O polling and execution phase.
And finally, the fourth phase is the close callbacks, in this phase, all close events are processed, for example when a server or a WebSocket shut down.
These are the four phases in the event loop, but besides these four callbacks queues there are actually also two other queues, 1. nextTick() other 2. microtasks queue(which is mainly for resolved promises)
If there are any callbacks in one of these two queues to be processed, they will be executed right after the current phase of the event loop finishes instead of waiting for the entire loop/cycle to finish.
In other words, after each of these four phases, if there are any callbacks in these two special queues, they will be executed right away. Now imagine that a promise resolves and returns some data from an API call while the callback of an expired timer is running, In this case, the promise callback will be executed right after the one from the timer finish.
The same logic also applies to the nextTick() queue. The nextTick() is a function that we can use when we really, really need to execute a certain callback right after the current event loop phase. It's a bit similar to setImmediate, with the difference that setImmediate only runs after the I/O callback phase.
Will all the above things can happen in one tick/cycle of the event loop, In the meantime their new events could have arisen in a particular phase or old event could be expired, the event loop will handle those events with another new cycle.
So now it's time to decide whether the loop should continue to the next tick or if the program should exit. Node simply checks whether there are any timers or I/O tasks that are still running in the background if there aren't any then it will exit the application. But if there are any pending timers or I/O tasks, then the node will continue running the event loop and go starting to the next cycle.
For example, in node application when we are listening for incoming HTTP requests, we basically running an infinite I/O task, and that is run in the event loop, for that Node.js keep running and keep listening for new HTTP request coming in instead of just exiting the application.
Also when we are writing or reading a file in the background that's also an I/O task and it makes sense that the app doesn't exist while it's working with that file, right?
Now The event loop in practices:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
});
setImmediate(()=>console.log('Immediate 1 finished'))
console.log('Hello from the top level code');
Output: Well the first lin is Hello from the top level code, yes it is expected because this is a code that gets executed immediately. Then after we have three output, Timer 1 finished this line is expected because of phase one as we discuess before, but after that I/O finished should be printed, because we discuess that setImmediate runs after the I/O callback phase, but this code is actually not in an I/O cycle, so it is not running inside of the event loop, because it's not runnin inside of any callback function.
Now lets do another test:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 0);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
Output:
The output is as expected right? Now let's add some delay:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
output:
In the first cycle inside I/O everything executed, but because of the dealy Timer-2 executed inside its code in the second cycle.
Now Lets add nextTick(), and see how nodejs behaves:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
process.nextTick(()=>console.log('Process Next Tick'));
});
console.log('Hello from the top level code')
Output:
Well, the first callback is executed is inside the process.NextTick(), as it is expected right? Because nextTicks callbacks stays in the microtask queue an they executed after each phase.
Event Loop
The Node.js event loop runs under a single thread, this means the application code you write is evaluated on a single thread. Nodejs itself uses many threads underneath through libuv, but you never have to deal with with those when writing nodejs code.
Every call that involves I/O call requires you to register a callback. This call also returns immediately, this allows you to do multiple IO operations in parallel without using threads in your application code. As soon as an I/O operation is completed it's callback will be pushed on the event loop. It will be executed as soon as all the other callbacks that where pushed on the event loop before it are executed.
There are a few methods to do basic manipulation of how callbacks are added to the event loop. Usually you shouldn't need these, but every now and then they can be useful.
At no point will there ever be two true parallel paths of execution, so all operations are inherently thread safe. There usually will be several asynchronous concurrent paths of execution that are being managed by the event loop.
Read More about the event loop
Limitations
Because of the event loop, node doesn't have to start a new thread for every incoming tcp connection. This allows node to service hundreds of thousands of requests concurrently , as long as you aren't calculating the first 1000 prime numbers for each request.
This also means it's important to not do CPU intensive operations, as these will keep a lock on the event loop and prevent other asynchronous paths of execution from continuing.
It's also important to not use the sync
variant of all the I/O methods, as these will keep a lock on the event loop as well.
If you want to do CPU heavy things you should ether delegate it to a different process that can execute the CPU bound operation more efficiently or you could write it as a node native add on.
Read more about use cases
Control Flow
In order to manage writing many callbacks you will probably want to use a control flow library. I believe this is currently the most popular callback based library:
I've used callbacks and they pretty much drove me crazy, I've had much better experience using Promises, bluebird is a very popular and fast promise library:
I've found this to be a pretty sensitive topic in the node community (callbacks vs promises), so by all means, use what you feel will work best for you personally. A good control flow library should also give you async stack traces, this is really important for debugging.
The Node.js process will finish when the last callback in the event loop finishes it's path of execution and doesn't register any other callbacks.
This is not a complete explanation, I advice you to check out the following thread, it's pretty up to date:
How do I get started with Node.js
If you run this simple node code
console.log('starting')
setTimeout(()=>{
console.log('0sec')
}, 0)
setTimeout(()=>{
console.log('2sec')
}, 2000)
console.log('end')
What do you expect output to be? If its,
starting
0sec
end
2sec
it's is wrong guess, we will get
starting
end
0sec
2sec
because node will never print code in event loop before exiting main()
So basically, First main()
will go in stack, then console.log('starting ')
so you will see it printed first, after that come setTimeout(()=>{console.log('0sec')}, 0)
will go in a stack and then in nodeAPI (node uses multi-threads (lib written in c++) to execute setTimeout to finish, even tho above code is single thread code) after time is up it moves to the event loop, now node can't print it unless stack is not empty. So, next line i.e setTimeout of 2sec will be first pushed to stack,then nodeAPI which will wait for 2 sec to finish, and then to even loop, in mean while next code line will be executed that is console.log('end') and so we see end msg before 0sec, because if nodes non blocking nature. After end code is over and hence main is poped out and its turn of event loop code to be executed that is first 0sec and after that 2sec msg will be printed.
From Willem's answer:
The Node.js event loop runs under a single thread. Every I/O call requires you to register a callback. Every I/O call also returns immediately, this allows you to do multiple IO operations in parallel without using threads.
I would like to start explaining with this above quote, which is one of the common misunderstandings of node js framework that I am seeing everywhere.
Node.js does not magically handle all those asynchronous calls with just one thread and still keep that thread unblocked. It internally uses google's V8 engine and a library called libuv(written in c++) that enables it to delegate some potential asynchronous work to other worker threads (kind of like a pool of threads waiting there for any work to be delegated from the master node thread). Then later when those threads finish their execution they call their callbacks and that is how the event loop is aware of the fact that the execution of a worker thread is completed.
The main point and advantage of nodejs is that you will never need to care about those internal threads and they will stay away from your code!. All the nasty sync stuff that should normally happen in multi threaded environments will be abstracted out by nodejs framework and you can happily work on your single thread (main node thread) in a more programmer friendly environment (while benefiting from all the performance enhancements of multiple threads).
Below is a good post if anyone is interested: When is the thread pool used?