Why is File.exists() behaving flakily in multithreaded environment?

我是研究僧i 提交于 2019-12-04 10:19:14

Your application might be properly multithreaded, whenever you are accessing the FileSystem, it has limitations. In your case, I would bet that too many threads are accessing it at the same time, with the consequence that FS runs out of file handle. File instances have no way to tell you that, as exists() do not throw Exception, so they simply return false, even if the directory exists.

user207421

The real question here is why are you calling it?

  • You have to construct a FileInputStream or FileReader to read a file, and these will throw a FileNotFoundException if the file can't be opened, with absolute reliability.
  • You have to catch exceptions anyway.
  • The operating system has to check whether the file exists anyway.
  • There is no need to check it twice.
  • Existence can change between checking it and opening the file.

So, don't check it twice. Let opening the file do all the work.

Is it a mistake to be writing files in a multithreaded process

I wouldn't say it's a mistake, but it's pretty pointless. The disk isn't multi-threaded.

Would a smaller Thread Pool help (currently 30)?

I would definitely reduce this anyway, to four or so, not to fix this problem but to reduce thrashing and almost certainly improve throughput.

I have marked @Olivier's answer as "the" answer, but I am providing my own here, in order to summarize the findings of my experiment. I am calling it "the" answer for getting closer to the truth than anyone else, even though his guess about File Handles does not seem to be obviously correct, although I can't disprove it either. What does ring true is his simple statement "Your application might be properly multithreaded, whenever you are accessing the FileSystem, it has limitations." This is consistent with my findings. If anyone can shed any further light, I may change this.

  1. Is it a bug in my code?

Highly doubtful. Running the same process repeatedly over the same list of files randomly shows a few files showing as non-existent when they do, in fact, exist. Running the process again, these same files are found to exist. There is zero chance that the existence of these files would have changed in the interim.

  1. Does using java.nio.Files.exists() rather than java.io.File.exists() help?

No. The underlying interface to the file system does not appear to be different. The nio improvements in this area seem to be confined to the handling of links in nio, which is not the issue here. But I can't say for sure, as this is native code.

  1. Does putting the input and output files in different directories, so that my existence checks are not reading the same directory where the output files are getting written to, help?

No. It does not appear to be two simultaneous hits on the directory that causes the problem, so much as two simultaneous hits on the file system.

  1. Does reducing the number of threads in the pool help?

Only reducing it to 1 makes it reliable, in other words only doing away with the multithreaded approach altogether, helps. This operation does not appear to be 100% reliable at least not with this OS and JDK, multithreaded.

If sox were ever to be redesigned so as to give a distinct error code for File Not Found on the input files, this might make the answer of @EJP above feasible.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!