How does shared memory vs message passing handle large data structures?

南方客 2020-12-22 19:13

In looking at Go and Erlang\'s approach to concurrency, I noticed that they both rely on message passing.

This approach obviously alleviates the need for complex loc

  • 2020-12-22 19:52

    Usually message passing languages (this is especially easy in erlang, since it has immutable variables) optimise away the actual data copying between the processes (of course local processes only: you'll want to think your network distribution pattern wisely), so this isn't much an issue.

  • 2020-12-22 19:57

    Note that your questions are technically non-sensical because message passing can use shared state so I shall assume that you mean message passing with deep copying to avoid shared state (as Erlang currently does).

    Will using shared state be faster and use less memory than message passing, as locks will mostly be unnecessary because the data is read-only, and only needs to exist in a single location?

    Using shared state will be a lot faster.

    How would this problem be approached in a message passing context? Would there be a single process with access to the data structure and clients would simply need to sequentially request data from it? Or, if possible, would the data be chunked to create several processes that hold chunks?

    Either approach can be used.

    Given the architecture of modern CPUs & memory, is there much difference between the two solutions -- i.e., can shared memory be read in parallel by multiple cores -- meaning there is no hardware bottleneck that would otherwise make both implementations roughly perform the same?

    Copying is cache unfriendly and, therefore, destroys scalability on multicores because it worsens contention for the shared resource that is main memory.

    Ultimately, Erlang-style message passing is designed for concurrent programming whereas your questions about throughput performance are really aimed at parallel programming. These are two quite different subjects and the overlap between them is tiny in practice. Specifically, latency is typically just as important as throughput in the context of concurrent programming and Erlang-style message passing is a great way to achieve desirable latency profiles (i.e. consistently low latencies). The problem with shared memory then is not so much synchronization among readers and writers but low-latency memory management.

  • 2020-12-22 20:02
    • Yes, shared state could be faster in this case. But only if you can forgo the locks, and this is only doable if it's absolutely read-only. if it's 'mostly read-only' then you need a lock (unless you manage to write lock-free structures, be warned that they're even trickier than locks), and then you'd be hard-pressed to make it perform as fast as a good message-passing architecture.

    • Yes, you could write a 'server process' to share it. With really lightweight processes, it's no more heavy than writing a small API to access the data. Think like an object (in OOP sense) that 'owns' the data. Splitting the data in chunks to enhance parallelism (called 'sharding' in DB circles) helps in big cases (or if the data is on slow storage).

    • Even if NUMA is getting mainstream, you still have more and more cores per NUMA cell. And a big difference is that a message can be passed between just two cores, while a lock has to be flushed from cache on ALL cores, limiting it to the inter-cell bus latency (even slower than RAM access). If anything, shared-state/locks is getting more and more unfeasible.

    in short.... get used to message passing and server processes, it's all the rage.

    Edit: revisiting this answer, I want to add about a phrase found on Go's documentation:

    share memory by communicating, don't communicate by sharing memory.

    the idea is: when you have a block of memory shared between threads, the typical way to avoid concurrent access is to use a lock to arbitrate. The Go style is to pass a message with the reference, a thread only accesses the memory when receiving the message. It relies on some measure of programmer discipline; but results in very clean-looking code that can be easily proofread, so it's relatively easy to debug.

    the advantage is that you don't have to copy big blocks of data on every message, and don't have to effectively flush down caches as on some lock implementations. It's still somewhat early to say if the style leads to higher performance designs or not. (specially since current Go runtime is somewhat naive on thread scheduling)

  • 2020-12-22 20:03

    Most modern processors use variants of the MESI protocol. Because of the shared state, Passing read-only data between different threads is very cheap. Modified shared data is very expensive though, because all other caches that store this cache line must invalidate it.

    So if you have read-only data, it is very cheap to share it between threads instead of copying with messages. If you have read-mostly data, it can be expensive to share between threads, partly because of the need to synchronize access, and partly because writes destroy the cache friendly behavior of the shared data.

    Immutable data structures can be beneficial here. Instead of changing the actual data structure, you simply make a new one that shares most of the old data, but with the things changed that you need changed. Sharing a single version of it is cheap, since all the data is immutable, but you can still update to a new version efficiently.

