It is almost impossible to make a non-GC memory manager work in a multi-CPU environment without requiring a lock to be acquired and released every time memory is allocated or freed. Each lock acquisition or release will require a CPU to coordinate its actions with other CPUs, and such coordination tends to be rather expensive. A garbage-collection-based system can allow many memory allocations to occur without requiring any locks or other inter-CPU coordination. This is a major advantage. The disadvantage is that many steps in garbage collection require that the CPU's coordinate their actions, and getting good performance generally requires that such steps be consolidated to a significant degree (there's not much benefit to eliminating the requirement of CPU coordination on each memory allocation if the CPUs have to coordinate before each step of garbage collection). Such consolidation will often cause all tasks in the system to pause for varying lengths of time during collection; in general, the longer the pauses one is willing to accept, the less total time will be needed for collection.
If processors were to return to a descriptor-based handle/pointer system (similar to what the 80286 used, though nowadays one wouldn't use 16-bit segments anymore), it would be possible for garbage collection to be done concurrently with other operations (if a handle was being used when the GC wanted to move it, the task using the handle would have to be frozen while the data was copied from its old address to its new one, but that shouldn't take long). Not sure if that will ever happen, though (Incidentally, if I had my druthers, an object reference would be 32 bits, and a pointer would be an object reference plus a 32-bit offset; I think it will be awhile before there's a need for over 2 billion objects, or for any object over 4 gigs. Despite Moore's Law, if an application would have over 2 billion objects, its performance would likely be improved by using fewer, larger, objects. If an application would need an object over 4 gigs, its performance would likely be improved by using more, smaller, objects.)