Erlang Processes vs Java Threads

后端 未结 6 1419
别那么骄傲
别那么骄傲 2020-11-29 21:18

I am reading \"Elixir in Action\" book by Saša Jurić, and in the first chapter it says:

Erlang processes are completely isolated from each other. They

相关标签:
6条回答
  • 2020-11-29 21:26

    Repeat after me: "These are different paradigms"

    Say that aloud 20 times or so -- it is our mantra for the moment.

    If we really must compare apples and oranges, let's at least consider where the common aspects of "being fruit" intersect.

    Java "objects" are a Java programmer's basic unit of computation. That is, an object (basically a struct with arms and legs that has encapsulation somewhat more strictly enforced than in C++) is the primary tool with which you model the world. You think "This object knows/has Data {X,Y,Z} and performs Functions {A(),B(),C()} over it, carries the Data everywhere it goes, and can communicate with other objects by calling functions/methods defined as part of their public interface. It is a noun, and that noun does stuff.". That is to say, you orient your thought process around these units of computation. The default case is that things that happen amongst the objects occur in sequence, and a crash interrupts that sequence. They are called "objects" and hence (if we disregard Alan Kay's original meaning) we get "object orientation".

    Erlang "processes" are an Erlang programmer's basic unit of computation. A process (basically a self-contained sequential program running in its own time and space) is the primary tool with which an Erlanger models the world(1). Similar to how Java objects define a level of encapsulation, Erlang processes also define the level of encapsulation, but in the case of Erlang the units of computation are completely cut off from one another. You cannot call a method or function on another process, nor can you access any data that lives within it, nor does one process even run within the same timing context as any other processes, and there is no guarantee about the ordering of message reception relative to other processes which may be sending messages. They may as well be on different planets entirely (and, come to think of it, this is actually plausible). They can crash independently of one another and the other processes are only impacted if they have deliberately elected to be impacted (and even this involves messaging: essentially registering to receive a suicide note from the dead process which itself is not guaranteed to arrive in any sort of order relative to the system as a whole, to which you may or may not choose to react).

    Java deals with complexity directly in compound algorithms: how objects work together to solve a problem. It is designed to do this within a single execution context, and the default case in Java is sequential execution. Multiple threads in Java indicates multiple running contexts and is a very complex topic because of the impact activity in different timing contexts have on one another (and the system as a whole: hence defensive programming, exception schemes, etc.). Saying "multi-threaded" in Java means something different than it does in Erlang, in fact this is never even said in Erlang because it is always the base case. Note here that Java threads imply segregation as pertains to time, not memory or visible references -- visibility in Java is controlled manually by choosing what is private and what is public; universally accessible elements of a system must be either designed to be "threadsafe" and reentrant, sequentialized via queueing mechanisms, or employ locking mechanisms. In short: scheduling is a manually managed issue in threaded/concurrent Java programs.

    Erlang separates each processes' running context in terms of execution timing (scheduling), memory access and reference visibility and in doing so simplifies each component of an algorithm by isolating it completely. This is not just the default case, this is the only case available under this model of computation. This comes at the cost of never knowing exactly the sequence of any given operation once a part of your processing sequences crosses a message barrier -- because messages are all essentially network protocols and there are no method calls that can be guaranteed to execute within a given context. This would be analogous to creating a JVM instance per object, and only permitting them to communicate across sockets -- that would be ridiculously cumbersome in Java, but is the way Erlang is designed to work (incidentally, this is also the basis of the concept of writing "Java microservices" if one ditches the web-oriented baggage the buzzword tends to entail -- Erlang programs are, by default, swarms of microservices). Its all about tradeoffs.

    These are different paradigms. The closest commonality we can find is to say that from the programmer's perspective, Erlang processes are analogous to Java objects. If we must find something to compare Java threads to... well, we're simply not going to find something like that in Erlang, because there is no such comparable concept in Erlang. To beat a dead horse: these are different paradigms. If you write a few non-trivial programs in Erlang this will become readily apparent.

    Note that I'm saying "these are different paradigms" but have not even touched the topic of OOP vs FP. The difference between "thinking in Java" and "thinking in Erlang" is more fundamental than OOP vs FP.

    While it is true that Erlang's "concurrency oriented" or "process oriented" foundation is closer to what Alan Kay had in mind when he coined the term "object oriented"(2), that is not really the point here. What Kay was getting at was that one can reduce the cognitive complexity of a system by cutting your computrons into discrete chunks, and isolation is necessary for that. Java accomplishes this in a way that leaves it still fundamentally procedural in nature, but structures code around a special syntax over higher-order dispatching closures called "class definitions". Erlang does this by splitting the running context up per object. This means Erlang thingies can't call methods on one another, but Java thingies can. This means Erlang thingies can crash in isolation but Java thingies can't. A vast number of implications flow from this basic difference -- hence "different paradigms". Tradeoffs.


    Footnotes:

    1. Incidentally, Erlang implements a version of "the actor model", but we don't use this terminology as Erlang predates the popularization of this model. Joe was unaware of it when he designed Erlang and wrote his thesis.
    2. Alan Kay has said quite a bit about what he meant when he coined the term "object oriented", the most interesting being his take on messaging (one-way notification from one independent process with its own timing and memory to another) VS calls (function or method calls within a sequential execution context with shared memory) -- and how the lines blur a bit between programming interface as presented by the programming language and the implementation underneath.
    0 讨论(0)
  • 2020-11-29 21:29

    Java threads can in fact share memory. For example you can pass the same instance down to two separate threads and both can manipulate its state, leading to potential problems such as deadlocks.

    Elixir/Erlang on the other hand addresses this by the concept of immutability, so when you are passing something to a process, it will be a copy of the original value.

    0 讨论(0)
  • 2020-11-29 21:35

    Most definitely not. All threads in Java share the same address space so it is possible for one thread to trash things owned by another thread. In the Erlang VM this just isn't possible as each process is isolated from all others. That's the whole point of them. Any time your want to have one process do something with data from another your code has to send a message to the other process. The only things shared between processes are large binary objects and these are immutable.

    0 讨论(0)
  • 2020-11-29 21:39

    To complement on previous answers, Java threads have two types: daemon and non daemon.

    To change a thread's type you can call .setDaemon(boolean on). The difference is that a daemon thread does not keep the JVM from quitting. As the Javadoc for Thread says:

    The Java Virtual Machine exits when the only threads running are all daemon threads.

    That means: user threads (those that are not specifically set to be daemon) keeps the JVM from terminating. On the other hand, daemon threads MAY BE RUNNING when all non-daemon threads are finished in which case the JVM will quit. So, to answer your question: you can start a thread that does not quit the JVM when it finishes.

    As to the comparison with Erlang/Elixir don't forget: they are different paradigms, as already mentioned.

    It is not impossible for the JVM to mimic Erlang's behaviour though it is not for what it was intended to and, therefore, it goes with lots of trade-offs. The following projects tries to accomplish that:

    • Erjang
    • Akka
    0 讨论(0)
  • 2020-11-29 21:42

    when Java thread dies, it too does not impact other threads

    Let me ask a counterquestion: why do you think Thread.stop() has been deprecated for more than a decade? The reason why is precisely the negation of your statement above.

    To give two specific examples: you stop() a thread while it's executing something as innocuous-sounding as System.out.println() or Math.random(). Result: those two features are now broken for the entire JVM. The same pertains to any other synchronized code your application may execute.

    if we are looking at request-processing threads

    The application may theoretically be coded such that absolutely no shared resource protected by locks is ever used; however that will only help to point out the exact extent to which Java threads are codependent. And the "independence" achieved will only pertain to the request-processing threads, not to all threads in such an application.

    0 讨论(0)
  • 2020-11-29 21:43

    Isn't that true for Java threads as well? I mean when Java thread crashes, it too does not crash other threads

    Yes and No. I explain:

    • Referring to shared memory: Different threads in a Java process share the whole heap, therefore threads can interact in a huge number of planned and unplanned ways. However objects in the stack (e.g. a context you pass down to called method) or a ThreadLocal are their own thread's (unless they start sharing references).

    • Crashing: If a thread crashes in Java (a Throwable is propagated into Thread.run(), or something gets looped or blocked), that mishap might not affect other threads (e.g. a pool of connections in a server will continue to operate). However as different threads interact. Other threads will easily get stranded if one of them ends abnormally (e.g. one thread trying to read from an empty pipe from another thread which did not close its end). So unless the developers are highly paranoid careful, it is very likely that side effects will occur.

    I doubt that any other paradigm intends threads to operate as totally independent islands. They must share information and coordinate somehow. And then there will be the chance to mess things up. It is just they will take a more defensive approach that "gives you less rope to hang yourself" (same idiom as with pointers).

    0 讨论(0)
提交回复
热议问题