Old Gen heap is full and the Eden and Survivor are low and almost empty

匿名 (未验证) 提交于 2019-12-03 02:49:01

问题:

A production environment became very slow recently. The cpu of the process took 200%. It kept working however. After I restarted the service it functioned normal again. I have several symptoms : The Par survivor space heap was empty for a long time and garbage collection took about 20% of the cpu time.

JVM options:

X:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -                XX:+UseParNewGC, -XX:HeapDumpPath=heapdump.hprof, -XX:MaxNewSize=700m, -XX:MaxPermSize=786m, -XX:NewSize=700m, -XX:ParallelGCThreads=8, -XX:SurvivorRatio=25, -Xms2048m, -Xmx2048m       Arch   amd64      Dispatcher Apache Tomcat      Dispatcher Version 7.0.27      Framework  java      Heap initial (MB)  2048.0      Heap max (MB)  2022.125      Java version   1.6.0_35     Log path    /opt/newrelic/logs/newrelic_agent.log     OS  Linux     Processors  8     System Memory   8177.964, 8178.0 

More info in the attached pic When the problem occurred on the non-heap the used code cache and used cms perm gen dropped to half.

I took the info from the newrelic.

The question is why does the server start to work so slow.

Sometimes the server stops completely, but we found that there is a problem with PDFBox, when uploading some pdf and contains some fonts it crashes the JVM.

More info: I observed that every day the Old gen is filling up. Now I restart the server daily. After restart it's all nice and dandy but the old gen is filling up till next day and the server slows down till needs a restart.

回答1:

By Default CMS starts to collect concurrently if OldGen is 70%. If it can't free memory below this boundary, it will run permanently concurrent which will slow down operation significantly. If OldSpace is getting near full OldGen usage, it will panic and fall back to stop-the-world GC pause which can be very long (like 20 seconds). You probably need more headroom in OldGen (ensure your app does not leak memory ofc !). Additionally you can lower the threshold to start a concurrent collection (default 70%) using

-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50

this will trigger concurrent collection starting with 50% occupancy and increase chance CMS finishes GC in time. This will only help in case your allocation rate is too high, from your charts it looks like not-enough-headrooom/memleak + too high XX:CMSInitiatingOccupancyFraction. Give at least 500MB to 1 GB more OldGen space



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!