[Update - Sep 30, 2010]
Since I studied a lot on this & related topics, I\'ll write whatever tips I gathered out of my experiences and suggestions provided in answer
Nayan, here are the answers to your questions, and a couple of additional advices.
Advices:
Something that already has been proposed: pre-allocate and pool your buffers.
A different approach which can be effective if you can use any collection instead of contigous array of bytes (this is not the case if the buffers are used in IO): implement a custom collection which internally will be composed of many smaller-sized arrays. This is something similar to std::deque from C++ STL library. Since each individual array will be smaller than 85K, the whole collection won't get in LOH. The advantage you can get with this approach is the following: LOH is only collected when a full GC happens. If the byte[] in your application are not long-lived, and (if they were smaller in size) would get in Gen0 or Gen1 before being collected, this would make memory management for GC much easier, since Gen2 collection is much more heavyweight.
An advice on the testing & monitoring approach: in my experience, the GC behavior, memory footprint and other memory-related stuff need to be monitored for quite a long time to get some valid and stable data. So each time you change something in the code, have a long enough test with monitoring the memory performance counters to see the impact of the change.
I would also recommend to take a look at % Time in GC counter, as it can be a good indicator of the effectiveness of memory management. The larger this value is, the more time your application spends on GC routines instead of processing the requests from users or doing other 'useful' operations. I cannot give advices for what absolute values of this counter indicate an issue, but I can share my experience for your reference: for the application I am working on, we usually treat % Time in GC higher than 20% as an issue.
Also, it would be useful if you shared some values of memory-related perf counters of your application: Private bytes and Working set of the process, BIAH, Total committed bytes, LOH size, Gen0, Gen1, Gen2 size, # of Gen0, Gen1, Gen2 collections, % Time in GC. This would help better understand your issue.