Immutability and reordering

后端 未结 10 642
星月不相逢
星月不相逢 2020-12-04 10:07

The code below (Java Concurrency in Practice listing 16.3) is not thread safe for obvious reasons:

public class UnsafeLazyInitialization {
    private static         


        
相关标签:
10条回答
  • 2020-12-04 10:41

    The confusion I think you have here is what the author meant by safe publication. He was referring to the safe publication of a non-null Resource, but you seem to get that.

    Your question is interesting - is it possible to return a null cached value of resource?

    Yes.

    The compiler is allowed to reorder the operation like such

    public static Resource getInstance(){
       Resource reordered = resource;
       if(resource != null){
           return reordered;
       }
       return (resource = new Resource());
    } 
    

    This doesn't violate the rule of sequential consistency but can return a null value.

    Whether or not this is the best implementation is up for debate but there is no rules to prevent this type of reordering.

    0 讨论(0)
  • 2020-12-04 10:41

    After applying the JLS rules to this example, I have come to the conclusion that getInstance can definitely return null. In particular, JLS 17.4:

    The memory model determines what values can be read at every point in the program. The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model.

    It is then clear that in the absence of synchronization, null is a legal outcome of the method since each of the two reads can observe anything.


    Proof

    Decomposition of reads and writes

    The program can be decomposed as follows (to clearly see the reads and writes):

                                  Some Thread
    ---------------------------------------------------------------------
     10: resource = null; //default value                                  //write
    =====================================================================
               Thread 1               |          Thread 2                
    ----------------------------------+----------------------------------
     11: a = resource;                | 21: x = resource;                  //read
     12: if (a == null)               | 22: if (x == null)               
     13:   resource = new Resource(); | 23:   resource = new Resource();   //write
     14: b = resource;                | 24: y = resource;                  //read
     15: return b;                    | 25: return y;                    
    

    What the JLS says

    JLS 17.4.5 gives the rules for a read to be allowed to observe a write:

    We say that a read r of a variable v is allowed to observe a write w to v if, in the happens-before partial order of the execution trace:

    • r is not ordered before w (i.e., it is not the case that hb(r, w)), and
    • there is no intervening write w' to v (i.e. no write w' to v such that hb(w, w') and hb(w', r)).

    Application of the rule

    In our example, let's assume that thread 1 sees null and properly initialises resource. In thread 2, an invalid execution would be for 21 to observe 23 (due to program order) - but any of the other writes (10 and 13) can be observed by either read:

    • 10 happens-before all actions so no read is ordered before 10
    • 21 and 24 have no hb relationship with 13
    • 13 does not happens-before 23 (no hb relationship between the two)

    So both 21 and 24 (our 2 reads) are allowed to observe either 10 (null) or 13 (not null).

    Execution path that returns null

    In particular, assuming that Thread 1 sees a null on line 11 and initialises resource on line 13, Thread 2 could legally execute as follows:

    • 24: y = null (reads write 10)
    • 21: x = non null (reads write 13)
    • 22: false
    • 25: return y

    Note: to clarify, this does not mean that T2 sees non null and subsequently sees null (which would breach the causality requirements) - it means that from an execution perspective, the two reads have been reordered and the second one was committed before the first one - however it does look as if the later write had been seen before the earlier one based on the initial program order.

    UPDATE 10 Feb

    Back to the code, a valid reordering would be:

    Resource tmp = resource; // null here
    if (resource != null) { // resource not null here
        resource = tmp = new Resource();
    }
    return tmp; // returns null
    

    And because that code is sequentially consistent (if executed by a single thread, it will always have the same behaviour as the original code) it shows that the causality requirements are satisfied (there is a valid execution that produces the outcome).


    After posting on the concurrency interest list, I got a few messages regarding the legality of that reordering, which confirm that null is a legal outcome:

    • The transformation is definitely legal since a single-threaded execution won't tell the difference. [Note that] the transformation doesn't seem sensible - there's no good reason a compiler would do it. However, given a larger amount of surrounding code or perhaps a compiler optimization "bug", it could happen.
    • The statement about intra-thread ordering and program order is what made me question the validity of things, but ultimately the JMM relates to the bytecode that gets executed. The transformation could be done by the javac compiler in which case null will be perfectly valid. And there are no rules for how javac has to convert from Java source to Java bytecode so...
    0 讨论(0)
  • 2020-12-04 10:44

    I'm sorry if I'm wrong (because I'm not native-English speaker), but it seems to me, that mentioned statement:

    UnsafeLazyInitialization is actually safe if Resource is immutable.

    is torn out of the context. This statement is truly regarding to use initialization safety:

    The guarantee of initialization safety allows properly constructed immutable objects to be safely shared across threads without synchronization

    ...

    Initialization safety guarantees that for properly constructed objects, all threads will see the correct values of final fields that were set by the constructor

    0 讨论(0)
  • 2020-12-04 10:49

    There are essentially two questions that you are asking:

    1. Can the getInstance() method return null due to reordering?

    (which I think is what you are really after, so I'll try to answer it first)

    Even though I think designing Java to allow for this is totally insane, it seems like you are in fact correct that getInstance() can return null.

    Your example code:

    if (resource == null)
        resource = new Resource();  // unsafe publication
    return resource;
    

    is logically 100% identical to the example in the blog post you linked to:

    if (hash == 0) {
        // calculate local variable h to be non-zero
        hash = h;
    }
    return hash;
    

    Jeremy Manson then describes that his code can return 0 due to reordering. At first, I didn't believe it as I thought the following "happens-before"-logic must hold:

       "if (resource == null)" happens before "resource = new Resource();"
                                       and
         "resource = new Resource();" happens before "return resource;"
                                    therefore
    "if (resource == null)" happens before "return resource;", preventing null
    

    But Jeremy gives the following example in a comment to his blog post, how this code could be validly rewritten by the compiler:

    read = resource;
    if (resource==null)
        read = resource = new Resource();
    return read;
    

    This, in a single-threaded environment, behaves exactly identically to the original code, but, in a multi-threaded environment might lead to the following execution order:

    Thread 1                        Thread 2
    ------------------------------- -------------------------------------------------
    read = resource;    // null
                                    read = resource;                      // null
                                    if (resource==null)                   // true
                                        read = resource = new Resource(); // non-null
                                    return read;                          // non-null
    if (resource==null) // FALSE!!!
    return read;        // NULL!!!
    

    Now, from an optimization-standpoint, doing this doesn't make any sense to me, since the whole point of these things would be to reduce multiple reads to the same location, in which case it makes no sense that the compiler wouldn't generate if (read==null) instead, preventing the problem. So, as Jeremy points out in his blog, it is probably highly unlikely that this would ever happen. But it seems that, purely from a language-rules point of view, it is in fact allowed.

    This example is actually covered in the JLS:

    http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4

    The effect observed between the values of r2, r4, and r5 in Table 17.4. Surprising results caused by forward substitution is equivalent to what can happen with the read = resource, the if (resource==null), and the return resource in the example above.

    Aside: Why do I reference the blog post as the ultimate source for the answer? Because the guy who wrote it, is also the guy who wrote chapter 17 of the JLS on concurrency! So, he better be right! :)

    2. Would making Resource immutable make the getInstance() method thread-safe?

    Given the potential null result, which can happen independently of whether Resource is mutable or not, the immediate simple answer to this question is: No (not strictly)

    If we ignore this highly unlikely but possible scenario, though, the answer is: Depends.

    The obvious threading-problem with the code is that it might lead to the following execution order (without any need for any reordering):

    Thread 1                                 Thread 2
    ---------------------------------------- ----------------------------------------
    if (resource==null) // true;  
                                             if (resource==null)          // true
                                                 resource=new Resource(); // object 1
                                             return resource;             // object 1
        resource=new Resource(); // object 2
    return resource;             // object 2
    

    So, the non-thread-safety is coming from the fact that you might get two different objects back from the function (even though without reordering neither of them will ever be null).

    Now, what the book was probably trying to say is the following:

    The Java immutable objects like Strings and Integers try to avoid creating multiple objects for the same content. So, if you have "hello" in one spot and "hello" in another spot, Java will give you the same exact object reference. Similarly, if you have new Integer(5) in one spot and new Integer(5) in another. If this were the case with new Resource() as well, you would get the same reference back and object 1 and object 2 in the above example would be the exact same object. This would indeed lead to an effectively thread-safe function (ignoring the reordering problem).

    But, if you implement Resource yourself, I don't believe there is even a way to have the constructor return a reference to a previously created object rather than creating a new one. So, it should not be possible for you to make object 1 and object 2 be the exact same object. But, given that you are calling the constructor with the same arguments (none in both cases), it could be likely that, even though your created objects aren't the same exact object, they will, for all intents and purposes, behave as if they were, also effectively making the code thread-safe.

    This doesn't necessarily have to be the case, though. Imagine an immutable version of Date, for example. The default constructor Date() uses the current system time as the date's value. So, even though the object is immutable and the constructor is called with the same argument, calling it twice will probably not result in an equivalent object. Therefore the getInstance() method is not thread-safe.

    So, as a general statement, I believe the line you quoted from the book is just plain wrong (at least as taken out of context here).

    ADDITION Re: reordering

    I find the resource==new Resource() example a bit too simplistic to help me understand WHY allowing such reordering by Java would ever make sense. So let me see if I can come up with something where this would actually help optimization:

    System.out.println("Found contact:");
    System.out.println(firstname + " " + lastname);
    if (firstname==null) firstname = "";
    if (lastname ==null) lastname  = "";
    return firstname + " " + lastname;
    

    Here, in the most likely case that both ifs yield false, it is non-optimal to do the expensive String concatenation firstname + " " + lastname twice, once for the debug message, once for the return. So, it would indeed make sense here to reorder the code to do the following instead:

    System.out.println("Found contact:");
    String contact = firstname + " " + lastname;
    System.out.println(contact);
    if ((firstname==null) || (lastname==null)) {
        if (firstname==null) firstname = "";
        if (lastname ==null) lastname  = "";
        contact = firstname + " " + lastname;
    }
    return contact;
    

    As examples get more complex and as you start thinking about the compiler keeping track of what is already loaded/computed in the processor registers that it uses and intelligently skipping re-calculation of already existing results, this effect might actually become more and more likely to happen. So, even though I never thought I would ever say this when I went to bed last night, thinking about it more, I do actually now believe that this may have been a needed/good decision to truly allow for code optimization to do its most impressive magic. But it does still strike me as quite dangerous as I don't think many people are aware of this and even if they are, it's quite complex to wrap your head around how to write your code correctly without synchronizing everything (which will then do away many times over with any performance benefits gained from more flexible optimization).

    I guess if you didn't allow for this reordering, any caching and reuse of intermediate results of a series of process steps would become illegal, thus doing away with one of the most powerful compiler optimizations possible.

    0 讨论(0)
  • 2020-12-04 10:50

    It is indeed safe is UnsafeLazyInitialization.resource is immutable, i.e. the field is declared as final:

    private static final Resource resource = new Resource();
    

    It might also be considered as thread-safe if the Resource class itself is immutable and does not matter which instance you are using. In that case two calls could return different instances of Resource without issue apart from an increased memory consumption depending on the number of threads calling getInstance() at the same time).

    It seems far-fetched though and I believe there is a typo, real sentence should be

    UnsafeLazyInitialization is actually safe if *r*esource is immutable.

    0 讨论(0)
  • 2020-12-04 10:53

    UnsafeLazyInitialization.getInstance() can never return null.

    I'll use @assylias's table.

                                  Some Thread
    ---------------------------------------------------------------------
     10: resource = null; //default value                                  //write
    =====================================================================
               Thread 1               |          Thread 2                
    ----------------------------------+----------------------------------
     11: a = resource;                | 21: x = resource;                  //read
     12: if (a == null)               | 22: if (x == null)               
     13:   resource = new Resource(); | 23:   resource = new Resource();   //write
     14: b = resource;                | 24: y = resource;                  //read
     15: return b;                    | 25: return y;    
    

    I'll use the line numbers for Thread 1. Thread 1 sees the write on 10 before the read on 11, and the read on line 11 before the read on 14. These are intra-thread happens-before relationships and don't say anything about Thread 2. The read on line 14 returns a value defined by the JMM. Depending on the timing, it may be the Resource created on line 13, or it may be any value written by Thread 2. But that write has to happen-after the read on line 11. There is only one such write, the unsafe publish on line 23. The write to null on line 10 is not in scope because it happened before line 11 due to intra-thread ordering.

    It doesn't matter if Resource is immutable or not. Most of the discussion so far has focused on inter-thread action where immutability would be relevant, but the reordering that would allow this method to return null is forbidden by intra-thread rules. The relevant section of the spec is JLS 17.4.7.

    For each thread t, the actions performed by t in A are the same as would be generated by that thread in program-order in isolation, with each write w writing the value V(w), given that each read r sees the value V(W(r)). Values seen by each read are determined by the memory model. The program order given must reflect the program order in which the actions would be performed according to the intra-thread semantics of P.

    This basically means that while reads and writes may be reordered, reads and writes to the same variable have to appear like they happen in order to the Thread that executes the reads and writes.

    There's only a single write of null (on line 10). Either Thread can see its own copy of resource or the other Thread's, but it cannot see the earlier write to null after it reads either Resource.

    As a side note, the initialization to null takes place in a separate thread. The section on Safe Publication in JCIP states:

    Static initializers are executed by the JVM at class initialization time; because of internal synchronization in the JVM, this mechanism is guaranteed to safely publish any objects initialized in this way [JLS 12.4.2].

    It may be worth trying to write a test that gets UnsafeLazyInitialization.getInstance() to return null, and that gets some of the proposed equivalent rewrites to return null. You'll see that they're not truly equivalent.

    EDIT

    Here's an example that separates reads and writes for clarity. Let's say there's a public static variable object.

    public static Object object = new Integer(0);
    

    Thread 1 writes to that object:

    object = new Integer(1);
    object = new Integer(2);
    object = new Integer(3);
    

    Thread 2 reads that object:

    System.out.println(object);
    System.out.println(object);
    System.out.println(object);
    

    Without any form of synchronization providing inter-thread happens-before relationships, Thread 2 can print out lots of different things.

    1, 2, 3
    0, 0, 0
    3, 3, 3
    1, 1, 3
    etc.
    

    But it cannot print out a decreasing sequence like 3, 2, 1. The intra-thread semantics specified in 17.4.7 severely limit reordering here. If instead of using object three times we changed the example to use three separate static variables, many more outputs would be possible because there would be no restrictions on reordering.

    0 讨论(0)
提交回复
热议问题