问题
Here's my simple code to loop every second (doesn't need to be exact) and kick off a job if necessary:
while (true) {
// check db for new jobs and
// kick off thread if necessary
try {
Thread.sleep(1000);
} catch(Throwable t) {
LOG.error("", t);
}
}
This code has worked fine for several months. Just yesterday we started having problems where one of our servers seems to be hung in the Thread.sleep(1000) method. IOW - it's been over a day and the Thread.sleep hasn't returned. I started up jconsole and get this info about the thread.
Name: Thread-3
State: TIMED_WAITING
Total blocked: 2 Total waited: 2,820
Stack trace:
java.lang.Thread.sleep(Native Method)
xc.mst.scheduling.Scheduler.run(Scheduler.java:400)
java.lang.Thread.run(Thread.java:662)
Scheduler.java:400 is the Thread.sleep line above. The jconsole output doesn't increment "Total waited" every second as I'd expect. In fact it doesn't change at all. I even shut down jconsole and started it back up in the hopes that maybe that would force a refresh, but only got the same numbers again. I don't know what other explanation there could be besides that the jvm has incorrectly hung on the sleep command. In my years, though, I've had so few problems with the jvm that I assume it must be an oversight on my part.
note: The other thing to note is that no other thread is active. IOW - the cpu is nearly idle. I read somewhere that Thread.sleep could be legitimately starved if another thread was active, but that isn't the case here.
solaris version:
$ uname -a
SunOS xcmst 5.10 Generic_141415-08 i86pc i386 i86pc
java version:
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Server VM (build 20.1-b02, mixed mode)
回答1:
In addition to what bdonlan mentioned you may want to look into ScheduledThreadPoolExecutor. I work on a very similar type of project and this object has made my life easier, thanks to this little snippet.
ScheduleAtFixedRate
If any execution of this task takes longer than its period, then subsequent executions may start late, but will not concurrently execute.
I hope this helps!
回答2:
Are you depending on the system tick count to increase monotonically?
From what I've heard from someone experienced, it (occasionally) happens that the system tick goes backwards by one or two ticks. I haven't experienced it myself yet, but if you're depending on this, might this explain what's happening?
Edit:
When I said System.currentTimeMillis()
, I believe I was mistaken. I thought that System.currentTimeMillis()
is similar to Windows' GetTickCount()
function (i.e. it is measures a time that is independent of the system time), but in fact, that does not seem to be the case. So of course it can change, but that was not my point: apparently, tick counts measured by the system timer can also go backwards by a tick or two, even ignoring system time changes. Not sure if that helps, but thanks to Raedwald for pointing out the system time change possibility, since that's not what I meant.
回答3:
I know that you looked in jconsole, but it might be useful to send signal 3 to the process (that is, kill -3) and post more of the resulting thread dump here. Or, if you really want to get into the details, then you might consider taking one or more pstack/jstack dumps of the hung process in quick succession in order to show where the threads really are. Information is available online about how to correlate this information with a java thread dump.
Also, by "one of our servers," are you saying that the problem is reproducible on one server, but it never occurs on other servers? This indicates a problem with that one server. Check that everything is the same across your servers and that there are no issues on that hardware in particular.
Finally, this might not be a java problem per se. Thread.sleep(long) is a native method (maps directly onto the underlying operating system's thread management), so check that your OS is up to date.
回答4:
Have you considered using Timer & TimerTask.
Here is simple snippet which might help.
import java.util.Calendar;
import java.util.Timer;
import java.util.TimerTask;
public class Example {
public static void main(String args[]) {
Timer timer = new Timer();
TimerTask task = new TimerTask() {
@Override
public void run() {
Calendar instance = Calendar.getInstance();
System.out.println("time: " + instance.getTime() + " : " + instance.getTimeInMillis());
// check db for new jobs and
// kick off thread if necessary
}
};
int startingDelay = 0; // timer task will be started after startingDelay
int period = 1000; // you are using it as sleeping time in your code
timer.scheduleAtFixedRate(task, startingDelay, period);
}
}
EDIT
According to the discussions I have studied, Thread.sleep() is the sign of poorly designed code.
Reasons are
- ...The thread does not lose ownership of any monitors (from documentation).
- Blocks the thread from execution.
- And obviously it does not give any guarantee, that execution will start after sleeping time.
- To me, it is so much primitive to use Thread.sleep(). There is a whole package dedicated to concurrency.
Which one is better instead of Thread.sleep()? Which raises another question. I would suggest you to have a look in Concurrency chapter from the book Effective Java
.
回答5:
Thread.sleep() is not a good practice in Java programming. Just Google "Is Thread.sleep() bad?" and you will see my point.
Firstly, it makes the current Thread inaccessible by other parts of the program especially if it is multi-threaded. Maybe that is why you are experiencing the hang.
Secondly, it would be catastrophic if the current thread is EDT (Event Dispatch Thread) and the application has Swing GUI.
A better alternative would be Object.wait() :
final Object LOCK = new Object();
final long SLEEP = 1000;
public void run() {
while (true) {
// check db for new jobs and
// kick off thread if necessary
try {
synchronize (LOCK) {
LOCK.wait(SLEEP);
}
} catch (InterruptedException e) {
// usually interrupted by other threads e.g. during program shutdown
break;
}
}
}
回答6:
maybe you can try another tool other than Jconsole to first confirm that it is block in the sleep api.
For example, manually try using jstack to print it to file for many times and check the result.
Or use a better tool, such as Youkit (commercail) if your org has its license to profile the application in depth, or remote debug (maybe can not in production)
OR You can check whether the "// check db for new jobs " code is run during. by checking loggings, or profile, or any other method depends on your application........ If the check db is very quick, and then sleep 1 seconds, if is very likely that you always see sleep in stack trace just because the compared probability....
来源:https://stackoverflow.com/questions/6749159/thread-sleep-is-hung