We have recently been migrating a number of applications from running under RedHat linux JDK1.6.0_03 to Solaris 10u8 JDK1.6.0_16 (much higher spec machines) and we have not
In order to solve these problems I think you need to be able to replicate them in a controlled environment where you can precisely duplicate runs with different tuning parameters and/or code changes. If you can't do that hiring an expert probably isn't going to do you any good, and the cheapest way out of the problem is probably buying more RAM.
Also make sure it's not a hardware fault (try running MemTest86 or similar on the server.)
Which kind of SIGSEV errors exactly do you encounter?
If you run a 32bit VM, it could be what I described here: http://janvanbesien.blogspot.com/2009/08/mysterious-jvm-crashes-explained.html
Interesting problem. Sounds like one of the garbage collectors works poorly on your particular situation.
Have you tried changing the garbage collector being used? There are a LOT of GC options, and figuring out which ones are optimal seems to be a bit of a black art, but I wonder if a basic change would work for you.
I know there is a "Server" GC that tends to work a lot better than the default ones. Are you using that?
Threaded GC (which I believe is the default) is probably the worst for your particular situation, I've noticed that it tends to be much less aggressive when the machine is busy.
One thing I've noticed, it often takes two GCs to convince Java to actually take out the trash. I think the first one tends to unlink a bunch of objects and the second actually deletes them. What you might want to do is occasionally force two garbage collections. This WILL cause a significant GC pause, but I've never seen a case where it took more than two to clean out the entire heap.
I have had the same issue on Solaris machines, and I solved it by decreasing the maximum size of the JVM. The 32 bit Solaris implementation apparently needs some overhead room beyond what you allocate for the JVM when doing garbage collections. So, for example, with -Xmx3580M
I'd get the errors you describe, but with -Xmx3072M
it would be fine.
What kind of OutOfMemoryError are you getting? Is the heap space exhausted or is the problem related to any of the other memory pools (the Error usually have a message giving more details on its cause).
If the heap is exhausted and the problem can be reproduced (it sounds as if it can), I would first of all configure the VM to produce a heap dump on OutOfMemoryErrors. You can then analyze the heap and make sure that it's not filled with objects, which are still reachable through some unexpected references.
It's of course not impossible that you are running into a VM bug, but if your application is relying on implementation specific behaviour in 1.6.0_03, it may for some reason or another end up as a memory hog when running on 1.6.0_16. Such problems may also be found if you are using some kind of server container for your application. Some developers are obviously unable to read documentation, but tend to observe the API behaviour and make their own conclusions about how something is supposed to work. This is of course not always correct and I've ran into similar problems both with Tomcat and with JBoss (both products at least used to work only with specific VMs).