Regarding the Cell architecture: it has an incoherent cache. Each SPE has its own local store of 256 KB. The SPEs can only access this memory; any other memory, such as the 512 MB of main memory or the local store of another SPE, has to be accessed with DMA. You perform the DMA manually and copy the memory into your local store by explicitly initiating a DMA transfer. This makes synchronization a huge pain.
Alternatively, you actually can access other memory. Main memory and each SPE's local store is mapped to a certain section of the 64-bit virtual address space. If you access data through the right pointers, the DMA happens behind the scenes, and it all looks like one giant shared memory space. The problem? Huge performance hit. Every time you access one of these pointers, the SPE stalls while the DMA occurs. This is slow, and it's not something you want to do in performance-critical code (i.e. a game).
This brings us to Skizz's point about vtables and pointer fixups. If you're blindly copying around vtable pointers between SPEs, you're going to incur a huge performance hit if you don't fix up your pointers, and you're also going to incur a huge performance hit if you do fix up your pointers and download the virtual function code to the SPEs.