How to list a 2 million files directory in java without having an “out of memory” exception

后端 未结 15 1596
挽巷
挽巷 2020-12-06 00:04

I have to deal with a directory of about 2 million xml\'s to be processed.

I\'ve already solved the processing distributing the work between machines and threads us

相关标签:
15条回答
  • 2020-12-06 00:15

    If file names follow certain rules, you can use File.list(filter) instead of File.listFiles to get manageable portions of file listing.

    0 讨论(0)
  • 2020-12-06 00:22

    I faced same problem when I developed malware scanning application. My solution is execute shell command to list all files. It's faster than recursively methods to browse folder by folder.

    see more about shell command here: http://adbshell.com/commands/adb-shell-ls

            Process process = Runtime.getRuntime().exec("ls -R /");
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
    
            //TODO: Read the stream to get a list of file path.
    
    0 讨论(0)
  • 2020-12-06 00:22

    You could use listFiles with a special FilenameFilter. The first time the FilenameFilter is sent to listFiles it accepts the first 1000 files and then saves them as visited.

    The next time FilenameFilter is sent to listFiles, it ignores the first 1000 visited files and returns the next 1000, and so on until complete.

    0 讨论(0)
  • 2020-12-06 00:22

    As a first approach you might try tweaking some JVM memory settings, e.g. increase heap size as it was suggested or even use AggressiveHeap option. Taking into account the large amount of files, this may not help, then I would suggest to workaround the problem. Create several files with filenames in each, say 500k filenames per file and read from them.

    0 讨论(0)
  • 2020-12-06 00:25

    Use File.list() instead of File.listFiles() - the String objects it returns consume less memory than the File objects, and (more importantly, depending on the location of the directory) they don't contain the full path name.

    Then, construct File objects as needed when processing the result.

    However, this will not work for arbitrarily large directories either. It's an overall better idea to organize your files in a hierarchy of directories so that no single directory has more than a few thousand entries.

    0 讨论(0)
  • 2020-12-06 00:31

    At fist you could try to increase the memory of your JVM with passing -Xmx1024m e.g.

    0 讨论(0)
提交回复
热议问题