What are the benefits of Java's types erasure?

后端 未结 11 1604
夕颜
夕颜 2020-11-28 01:45

I read a tweet today that said:

It\'s funny when Java users complain about type erasure, which is the only thing Java got right, while ignoring all th

相关标签:
11条回答
  • 2020-11-28 02:03

    (Although I already wrote an answer here, revisiting this question two years later I realize there is another, completely different way of answering it, so I'm leaving the previous answer intact and adding this one.)


    It is highly arguable whether the process done on Java Generics deserves the name "type erasure". Since generic types are not erased but replaced with their raw counterparts, a better choice seems to be "type mutilation".

    The quintessential feature of type erasure in its commonly understood sense is forcing the runtime to stay within the boundaries of the static type system by making it "blind" to the structure of the data it accesses. This gives full power to the compiler and allows it to prove theorems based on static types alone. It also helps the programmer by constraining the code's degrees of freedom, giving more power to simple reasoning.

    Java's type erasure does not achieve that—it cripples the compiler, like in this example:

    void doStuff(List<Integer> collection) { 
    }
    
    void doStuff(List<String> collection) // ERROR: a method cannot have 
                       // overloads which only differ in type parameters
    

    (The above two declarations collapse into the same method signature after erasure.)

    On the flip side, the runtime can still inspect the type of an object and reason about it, but since its insight into the true type is crippled by erasure, static type violations are trivial to achieve and hard to prevent.

    To make things even more convoluted, the original and erased type signatures co-exist and are considered in parallel during compilation. This is because the whole process is not about removing type information from the runtime, but about shoehorning a generic type system into a legacy raw type system to maintain backwards compatibility. This gem is a classic example:

    public static <T extends Object & Comparable<? super T>> T max(Collection<? extends T> coll)
    

    (The redundant extends Object had to be added to preserve backward compatibility of the erased signature.)

    Now, with that in mind, let us revisit the quote:

    It's funny when Java users complain about type erasure, which is the only thing Java got right

    What exactly did Java get right? Is it the word itself, regardless of meaning? For contrast take a look at the humble int type: no runtime type check is ever performed, or even possible, and the execution is always perfectly type-safe. That's what type erasure looks like when done right: you don't even know it's there.

    0 讨论(0)
  • 2020-11-28 02:05

    An additional point none of the other answers seem to have considered: if you really need generics with run-time typing, you can implement it yourself like this:

    public class GenericClass<T>
    {
         private Class<T> targetClass;
         public GenericClass(Class<T> targetClass)
         {
              this.targetClass = targetClass;
         }
    

    This class is then able to do all the things that would be achievable by default if Java did not use erasure: it can allocate new Ts (assuming T has a constructor that matches the pattern it expects to use), or arrays of Ts, it can dynamically test at run time if a particular object is a T and change behaviour depending on that, and so on.

    For example:

         public T newT () { 
             try {
                 return targetClass.newInstance(); 
             } catch(/* I forget which exceptions can be thrown here */) { ... }
         }
    
         private T value;
         /** @throws ClassCastException if object is not a T */
         public void setValueFromObject (Object object) {
             value = targetClass.cast(object);
         }
    }
    
    0 讨论(0)
  • 2020-11-28 02:08

    Most answers are more concerned about programming philosophy than the actual technical details.

    And although this question is more than 5 years old, the question still lingers: Why is type erasure desireable from a technical point of view? In the end, the answer is rather simple (on a higher level): https://en.wikipedia.org/wiki/Type_erasure

    C++ templates don't exist at runtime. The compiler emits a fully optimized version for each invocation, meaning the execution doesn't depend on type information. But how does a JIT deal with different versions of the same function? Wouldn't it be better to just have one function? Wouldn't want the JIT to have to optimize all the different versions of it. Well, but then what about type safety? Guess that has to go out of the window.

    But wait a second: How does .NET do it? Reflection! This way they only have to optimize one function and also get runtime type information. And that is why .NET generics used to be slower (though they have gotten much better). I am not arguing that that isn't convenient! But it is expensive and shouldn't be used when it isn't absolutely necessary (it isn't considered expensive in dynamically typed languages because the compiler / interpreter relies on reflection anyway).

    This way generic programming with type erasure is close to zero overhead (some runtime checks / casts are still required): https://docs.oracle.com/javase/tutorial/java/generics/erasure.html

    0 讨论(0)
  • 2020-11-28 02:09

    Types are a construct used for writing programs in a manner that allows the compiler to check the correctness of a program. A type is a proposition on a value - the compiler verifies that this proposition is true.

    During the execution of a program, there should be no need for type information - this has already been verified by the compiler. The compiler should be free to discard this information in order to perform optimisations on the code - make it run faster, generate a smaller binary etc. Erasure of type parameters facilitates this.

    Java breaks static typing by allowing type information to be queried at runtime - reflection, instanceof etc. This allows you to construct programs that cannot be statically verified - they bypass the type system. It also misses opportunities for static optimisation.

    The fact that type parameters are erased prevents some instances of these incorrect programs to be constructed, however, more incorrect programs would be disallowed if more type information was erased and the reflection and instanceof facilities were removed.

    Erasure is important for upholding the property of "parametricity" of a data type. Say I have a type "List" parameterised over component type T. i.e. List<T>. That type is a proposition that this List type works identically for any type T. The fact that T is an abstract, unbounded type parameter means that we know nothing about this type, therefore are prevented from doing anything special for special cases of T.

    e.g. say I have a List xs = asList("3"). I add an element: xs.add("q"). I end up with ["3","q"]. Since this is parametric, I can assume that List xs = asList(7); xs.add(8) ends up with [7,8] I know from the type that it doesn't do one thing for String and one thing for Int.

    Furthermore, I know that the List.add function can not invent values of T out of thin air. I know that if my asList("3") has a "7" added to it, the only possible answers would be constructed out of the values "3" and "7". There is no possibility of a "2" or "z" being added to the list because the function would be unable to construct it. Neither of these other values would be sensible to add, and parametricity prevents these incorrect programs from being constructed.

    Basically, erasure prevents some means of violating parametricity, thus eliminating possibilities of incorrect programs, which is the goal of static typing.

    0 讨论(0)
  • 2020-11-28 02:10

    This is not a direct answer (OP asked "what are the benefits", I am replying "what are the cons")

    Compared to C# type system, Java type erasure is a real pain for two raesons

    You can't implement an interface twice

    In C# you can implement both IEnumerable<T1> and IEnumerable<T2> safely, especially if the two types do not share a common ancestor (i.e. their ancestor is Object).

    Practical example: in Spring Framework, you can't implement ApplicationListener<? extends ApplicationEvent> multiple times. If you need different behaviours based on T you need to test instanceof

    You can't do new T()

    (and you need a reference to Class to do that)

    As others commented, doing the equivalent of new T() can only be done via reflection, only by invoking an instance of Class<T>, making sure about the parameters required by the constructor. C# allows you to do new T() only if you constrain T to parameterless constructor. If T does not respect that constraint, a compile error is raised.

    In Java, you will often be forced to write methods that look like the following

    public <T> T create(....params, Class<T> classOfT)
    {
    
        ... whatever you do
        ... you will end up
        T = classOfT.newInstance();
    
    
        ... or more advanced reflection
        Constructor<T> parameterizedConstructorThatYouKnowAbout = classOfT.getConstructor(...,...);
    }
    

    The drawbacks in the above code are:

    • Class.newInstance only works with a parameterless constructor. If none available, ReflectiveOperationException is thrown at runtime
    • Reflected constructor does not highlight problems at compile time like the above. If you refactor, of you swap arguments, you will know only at runtime

    If I was the author of C#, I would have introduced the ability to specify one or more constructor constraints that are easy to verify at compile time (so I can require for example a constructor with string,string params). But the last one is speculation

    0 讨论(0)
  • 2020-11-28 02:13

    Type Erasure Is Good

    Let's stick to the facts

    A lot of the answers thus far are overly concerned with the Twitter user. It's helpful to keep focused on the messages and not the messenger. There is a fairly consistent message with even just the excerpts mentioned thus far:

    It's funny when Java users complain about type erasure, which is the only thing Java got right, while ignoring all the things it got wrong.

    I get huge benefits (e.g. parametricity) and nil cost (alleged cost is a limit of imagination).

    new T is a broken program. It is isomorphic to the claim "all propositions are true." I am not big into this.

    A goal: reasonable programs

    These tweets reflect a perspective that is not interested in whether we can make the machine do something, but more whether we can reason that the machine will do something we actually want. Good reasoning is a proof. Proofs can be specified in formal notation or something less formal. Regardless of the specification language, they must be clear and rigorous. Informal specifications are not impossible to structure correctly, but are often flawed in practical programming. We end up with remediations like automated and exploratory tests to make up for the problems we have with informal reasoning. This is not to say that testing is intrinsically a bad idea, but the quoted Twitter user is suggesting that there is a much better way.

    So our goal is to have correct programs that we can reason about clearly and rigorously in a way that corresponds with how the machine will actually execute the program. This, though, is not the only goal. We also want our logic to have a degree of expressivity. For example, there's only so much we can express with propositional logic. It's nice to have universal (∀) and existential (∃) quantification from something like first-order logic.

    Using type systems for reasoning

    These goals can be very nicely addressed by type systems. This is especially clear because of the Curry-Howard correspondence. This correspondence is often expressed with the following analogy: types are to programs as theorems are to proofs.

    This correspondence is somewhat profound. We can take logical expressions, and translate them through the correspondence to types. Then if we have a program with the same type signature that compiles, we have proven that the logical expression is universally true (a tautology). This is because the correspondence is two-way. The transformation between the type/program and the theorem/proof worlds are mechanical, and can in many cases be automated.

    Curry-Howard plays nicely into what we'd like to do with specifications for a program.

    Are type systems useful in Java?

    Even with an understanding of Curry-Howard, some people find it easy to dismiss the value of a type system, when it

    1. is extremely hard to work with
    2. corresponds (through Curry-Howard) to a logic with limited expressivity
    3. is broken (which gets to the characterization of systems as "weak" or "strong").

    Regarding the first point, perhaps IDEs make Java's type system easy enough to work with (that's highly subjective).

    Regarding the second point, Java happens to almost correspond to a first-order logic. Generics give use the type system equivalent of universal quantification. Unfortunately, wildcards only give us a small fraction of existential quantification. But universal quantification is pretty good start. It's nice to be able to say that functions for List<A> work universally for all possible lists because A is completely unconstrained. This leads to what the Twitter user is talking about with respect to "parametricity."

    An often-cited paper about parametricity is Philip Wadler's Theorems for free!. What's interesting about this paper is that from just the type signature alone, we can prove some very interesting invariants. If we were to write automated tests for these invariants we would be very much wasting our time. For example, for List<A>, from the type signature alone for flatten

    <A> List<A> flatten(List<List<A>> nestedLists);
    

    we can reason that

    flatten(nestedList.map(l -> l.map(any_function)))
        ≡ flatten(nestList).map(any_function)
    

    That's a simple example, and you can probably reason about it informally, but it's even nicer when we get such proofs formally for free from the type system and checked by the compiler.

    Not erasing can lead to abuses

    From the perspective of language implementation, Java's generics (which correspond to universal types) play very heavily into the parametricity used to get proofs about what our programs do. This gets to the third problem mentioned. All these gains of proof and correctness require a sound type system implemented without defects. Java definitely has some language features that allow us to shatter our reasoning. These include but are not limited to:

    • side-effects with an external system
    • reflection

    Non-erased generics are in many ways related to reflection. Without erasure there's runtime information that's carried with the implementation that we can use to design our algorithms. What this means is that statically, when we reason about programs, we don't have the full picture. Reflection severely threatens the correctness of any proofs we reason about statically. It's no coincidence reflection also leads to a variety of tricky defects.

    So what are ways that non-erased generics might be "useful?" Let's consider the usage mentioned in the tweet:

    <T> T broken { return new T(); }
    

    What happens if T doesn't have a no-arg constructor? In some languages what you get is null. Or perhaps you skip the null value and go straight to raising an exception (which null values seem to lead to anyway). Because our language is Turing complete, it's impossible to reason about which calls to broken will involve "safe" types with no-arg constructors and which ones won't. We've lost the certainty that our program works universally.

    Erasing means we've reasoned (so let's erase)

    So if we want to reason about our programs, we're strongly advised to not employ language features that strongly threaten our reasoning. Once we do that, then why not just drop the types at runtime? They're not needed. We can get some efficiency and simplicity with the satisfaction that no casts will fail or that methods might be missing upon invocation.

    Erasing encourages reasoning.

    0 讨论(0)
提交回复
热议问题