I have a Java program that takes in a text file containing a list of text files and processes each line separately. To speed up the processing, I make use of threads using an ExecutorService with a FixedThreadPool with 24 threads. The machine has 24 cores and 48GB of RAM.
The text file that I'm processing has 2.5 million lines. I find that for the first 2.3 million lines or so things run very well with high CPU utilization. However, beyond some point (at around the 2.3 lines), the performance degenerates with only a single CPU being utilized and my program pretty much grinding to a halt.
I've investigated a number of causes, made sure all my file handles are closed, and increased the amount of memory supplied to the JVM. However, regardless of what I change, performance always degrades towards the end. I've even tried on text files containing fewer lines and once again performance decreases towards the end of processing the file.
In addition to the standard Java concurrency libraries, the code also makes use of Lucene libraries for text processing and analysis.
When I don't thread this code, the performance is constant and doesn't degenerate towards the end. I know this is a shot in the dark and it's hard to describe what is going on, but I thought I would just see if anyone has any ideas as to what might be causing this degeneration in performance towards the end.
Edit
After the comments I've received, I've pasted a stack trace here. As you can see, it doesn't appear as if any of the threads are blocking. Also, when profiling, the GC was not at 100% when things slowed down. In fact, both CPU and GC utilization were at 0% most of the time, with the CPU spiking occasionally to process a few files and then stopping again.
Code for executing threads
BufferedReader read = new BufferedReader(new FileReader(inputFile));
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
String line;
while ((line = read.readLine()) != null) { //index each line
Runnable worker = new CharikarHashThreader(line, bits, minTokens);
executor.execute(worker);
}
read.close();
This sounds alot like a Garbage Collection / Memory Issue.
When the Garbage Collection runs it pauses all threads so that the GC thread can do its "is this collectable garbage" analysis without things changing on it. While the GC is running you'll see exactly 1 thread at 100%, the other threads will be stuck at 0%.
I would consider adding a few Runtime.freeMemory() calls (or using a profiler) to see if the "grind to a halt" occurs during GC.
I'd also trying running your program on just the first 10k lines of your file to see if that works.
I'd also look to see if your program is building too many intermediate Strings when it should be using StringBuilders.
It sounds to me like you need to profile your memory usage.
I initially thought it was GC problems as well but I'm not so sure give the following information.
I've even tried on text files containing fewer lines and once again performance decreases towards the end of processing the file.
My guess is that the threads haven't quit but are jammed somehow. I would recommend taking a thread dump (kill -QUIT pid
under *nix or by using jstack
) and see where the threads are. This will help you identify if they are jammed somewhere.
I suspect that your program starts off with 24 threads running but over time you lose one and then another. Although it seems like there is a dramatic performance drop off at the end I wonder if the program has been getting slower and slower from the start.
- Watch for sockets without proper connection or IO timeouts.
- Maybe some sort of lock contention that is blocking threads?
- Maybe something that Lucene is doing is either causing contention or is blocking your threads. As mentioned by @GPI, I would try commenting out the Lucene calls and see if the problem goes away. Again, a stack-trace will also show this to you.
Once you determine where the threads are blocking on, you will need to either resolve the lock issues, add timeouts to network calls, or otherwise fix the problem.
Hope this helps.
来源:https://stackoverflow.com/questions/16064544/java-threads-slow-down-towards-the-end-of-processing