Everybody says that immutable objects are thread safe, but why is this?
Take the following scenario running on a multi core CPU:
The object's immutability isn't the real question in your scenario. Rather, your description's issue revolves around the reference, list, or other system which points to the object. It would of course need some sort of technique to make sure the old object is no longer availble to the thread which may have tried to access it.
The real point to immutable object's thread safety is that you don't need to write a bunch of code to produce thread safety. Rather the framework, OS, CPU (and whatever else) do the work for you.
I'm not sure that a memory gate would change this scenario, as that would surely only affect subsequent reads... and then the question becomes reads from where? If it is from a field (which must at a minimum be static or an instance fields for some instance still on the stack or otherwise reachable), or local variable - then by definition it isn't available for collection.
Re the scenario where that reference is only now in the registers... that is far trickier. Intuitively I want to say "no that isn't a problem", but it would take a detailed look at the memory model to prove it. But handling references is such a common scenario that simply: this has to work.
You're missing that it would be a bad garbage collector indeed that let such a thing happen. The reference on core 1 should have prevented the object from being GCd.
I think what you're asking is whether, after an object is created, the constructor returns, and a reference to it is stored somewhere, there is any possibility that a thread on another processor will still see the old data. You offer as a scenario the possibility that a cache line holding instance data for the object was previously used for some other purpose.
Under an exceptionally weak memory model, such a thing might be possible, but I would expect any useful memory model, even a relatively weak one, would ensure that dereferencing an immutable object would be safe, even if such safety required padding objects enough that no cache line be shared between object instances (the GC will almost certainly invalidate all caches when it's done, but without such padding, it would be possible that an immutable object created by core #2 might share a cache line with an object that core #1 had previously read). Without at least that level of safety, writing robust code would require so many locks and memory barriers that it would be hard to write multi-processor code that wasn't slower than single-processor code.
The popular x86 and x64 memory models provide the guarantee you seek, and go much further. Processors coordinate 'ownership' of cache lines; if multiple processors want to read the same cache line, they can do so without impediment. When a processor wants to write a cache line, it negotiates with other processors for ownership. Once ownership is acquired, the processor will perform the write. Other processors will not be able to read or write the cache line until the processor that owns the cache line gives it up. Note that if multiple processors want to write the same cache line simultaneously, they will likely spend most of their time negotiating cache-line ownership rather than performing actual work, but semantic correctness will be preserved.