Data Races in JavaScript?

别说谁变了你拦得住时间么 提交于 2019-12-20 04:05:29

问题


Lets assume I run this piece of code.

var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function() { score++; }); // increment callback function
}

In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded. Am I guaranteed that the final value of score will be equal to arbitrary_length?


回答1:


Node uses an event loop. You can think of this as a queue. So we can assume, that your for loop puts the function() { score++; } callback arbitrary_length times on this queue. After that the js engine runs these one by one and increase score each time. So yes. The only exception if a callback is not called or the score variable is accessed from somewhere else.

Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.

var results = [];
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function(result) {
          results.push(result);
          if (results.length == arbitrary_length)
               tasksDone(results);
     });
}



回答2:


Am I guaranteed that the final value of score will be equal to arbitrary_length?

Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.

It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.

There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running). Everything is serialized through the event queue. This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.

There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access. While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.

You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background? and that answer also contains a number of other references.

And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?

Some other references:

how do I prevent event handlers to handle multiple events at once in javascript?

Do I need to be concerned with race conditions with asynchronous Javascript?

JavaScript - When exactly does the call stack become "empty"?

Node.js server with multiple concurrent requests, how does it work?


To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.

I have a Raspberry Pi node.js server that controls the attic fans in my house. Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays). It also records temperature data that can be presented in charts. Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash. That saving operation involves a series of async file writes. Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion. Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical). So, I'm writing the live in-memory data to disk.

At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing. That's a concurrency issue waiting to happen. If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed. That's a concurrency issue.

I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code). It happens once every few days on my server. I know it's there and it's real.

There are many ways to work around those types of concurrency issues. The simplest would have been to just make a copy in memory of all the data and then write out the copy. Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue). But, that wasn't practical in this case. So, I implemented a queue. Whenever I start writing, I set a flag on the object that manages the data. Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue. The actual data is not touched while that flag is set. When the data has been safely written to disk, the flag is reset and the queued items are processed. Any concurrency issue was safely avoided.


So, this is an example of concurrency issues that you do have to be concerned about. One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system. This is why we don't need mutexes and semaphores and other things like that in our own Javascript. We can use simple flags (just a regular Javascript variable) like I described above if needed.


In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript. A synchronous piece of Javascript will run to completion before the next event in the event queue is processed. This is what is meant by Javascript being an "event-driven" language. As an example of this, if you had this code:

 console.log("A");
 // schedule timer for 500 ms from now
 setTimeout(function() {
     console.log("B");
 }, 500);

 console.log("C");

 // spin for 1000ms
 var start = Date.now();
 while(Data.now() - start < 1000) {}

 console.log("D");

You would get the following in the console:

A
C
D
B

The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that. The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event.

Here's the sequence of events under the covers.

  1. This JS starts running.
  2. console.log("A") is output.
  3. A timer event is schedule for 500ms from now. The timer subsystem uses native code.
  4. console.log("C") is output.
  5. The code enters the spin loop.
  6. At some point in time part-way through the spin loop the previously set timer is ready to fire. It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.
  7. The spin loop finishes.
  8. console.log("D") is output.
  9. This piece of Javascript finishes and returns control back to the system.
  10. The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution). That code starts running and console.log("B") is output.
  11. That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.



回答3:


No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem. The only problem would be ifin some cases async_task(..) drops the callback. But if, e.g., 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.

Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks. There is no threading issue however.



来源:https://stackoverflow.com/questions/39338832/data-races-in-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!