How do ValueTypes derive from Object (ReferenceType) and still be ValueTypes?

后端 未结 6 1790
闹比i
闹比i 2020-11-22 17:20

C# doesn\'t allow structs to derive from classes, but all ValueTypes derive from Object. Where is this distinction made?

How does the CLR handle this?

相关标签:
6条回答
  • 2020-11-22 17:46

    This is a somewhat artificial construct maintained by the CLR in order to allow all types to be treated as a System.Object.

    Value types derive from System.Object through System.ValueType, which is where the special handling occurs (ie: the CLR handles boxing/unboxing, etc for any type deriving from ValueType).

    0 讨论(0)
  • 2020-11-22 17:46

    Rationale

    Of all the answers, @supercat's answer comes closest to the actual answer. Since the other answers don't really answer the question, and downright make incorrect claims (for example that value types inherit from anything), I decided to answer the question.

     

    Prologue

    This answer is based on my own reverse engineering and the CLI specification.

    struct and class are C# keywords. As far as the CLI is concerned, all types (classes, interfaces, structs, etc.) are defined by class definitions.

    For example, an object type (Known in C# as class) is defined as follows:

    .class MyClass
    {
    }
    

     

    An interface is defined by a class definition with the interface semantic attribute:

    .class interface MyInterface
    {
    }
    

     

    What about value types?

    The reason that structs can inherit from System.ValueType and still be value types, is because.. they don't.

    Value types are simple data structures. Value types do not inherit from anything and they cannot implement interfaces. Value types are not subtypes of any type, and they do not have any type information. Given a memory address of a value type, it's not possible to identify what the value type represents, unlike a reference type which has type information in a hidden field.

    If we imagine the following C# struct:

    namespace MyNamespace
    {
        struct MyValueType : ICloneable
        {
            public int A;
            public int B;
            public int C;
    
            public object Clone()
            {
                // body omitted
            }
        }
    }
    

    The following is the IL class definition of that struct:

    .class MyNamespace.MyValueType extends [mscorlib]System.ValueType implements [mscorlib]System.ICloneable
    {
        .field public int32 A;
        .field public int32 B;
        .field public int32 C;
    
        .method public final hidebysig newslot virtual instance object Clone() cil managed
        {
            // body omitted
        }
    }
    

    So what's going on here? It clearly extends System.ValueType, which is an object/reference type, and implements System.ICloneable.

    The explanation is, that when a class definition extends System.ValueType it actually defines 2 things: A value type, and the value type's corresponding boxed type. The members of the class definition define the representation for both the value type and the corresponding boxed type. It is not the value type that extends and implements, it's the corresponding boxed type that does. The extends and implements keywords only apply to the boxed type.

    To clarify, the class definition above does 2 things:

    1. Defines a value type with 3 fields (And one method). It does not inherit from anything, and it does not implement any interfaces (value types can do neither).
    2. Defines an object type (the boxed type) with 3 fields (And implementing one interface method), inheriting from System.ValueType, and implementing the System.ICloneable interface.

    Note also, that any class definition extending System.ValueType is also intrinsically sealed, whether the sealed keyword is specified or not.

    Since value types are just simple structures, don't inherit, don't implement and don't support polymorphism, they can't be used with the rest of the type system. To work around this, on top of the value type, the CLR also defines a corresponding reference type with the same fields, known as the boxed type. So while a value type can't be passed around to methods taking an object, its corresponding boxed type can.

     

    Now, if you were to define a method in C# like

    public static void BlaBla(MyNamespace.MyValueType x),

    you know that the method will take the value type MyNamespace.MyValueType.

    Above, we learned that the class definition that results from the struct keyword in C# actually defines both a value type and an object type. We can only refer to the defined value type, though. Even though the CLI specification states that the constraint keyword boxed can be used to refer to a boxed version of a type, this keyword doesn't exist (See ECMA-335, II.13.1 Referencing value types). But lets imagine that it does for a moment.

    When refering to types in IL, a couple of constraints are supported, among which are class and valuetype. If we use valuetype MyNamespace.MyType we're specifying the value type class definition called MyNamespace.MyType. Likewise, we can use class MyNamespace.MyType to specify the object type class definition called MyNamespace.MyType. Which means that in IL you can have a value type (struct) and an object type (class) with the same name and still distinguish them. Now, if the boxed keyword noted by the CLI specification was actually implemented, we'd be able to use boxed MyNamespace.MyType to specify the boxed type of the value type class definition called MyNamespace.MyType.

    So, .method static void Print(valuetype MyNamespace.MyType test) cil managed takes the value type defined by a value type class definition named MyNamespace.MyType,

    while .method static void Print(class MyNamespace.MyType test) cil managed takes the object type defined by the object type class definition named MyNamespace.MyType.

    likewise if boxed was a keyword, .method static void Print(boxed MyNamespace.MyType test) cil managed would take the boxed type of the value type defined by a class definition named MyNamespace.MyType.

    You'd then be able to instantiate the boxed type like any other object type and pass it around to any method that takes a System.ValueType, object or boxed MyNamespace.MyValueType as an argument, and it would, for all intents and purposes, work like any other reference type. It is NOT a value type, but the corresponding boxed type of a value type.

     

    Summary

    So, in summary, and to answer the question:

    Value types are not reference types and do not inherit from System.ValueType or any other type, and they cannot implement interfaces. The corresponding boxed types that are also defined do inherit from System.ValueType and can implement interfaces.

    A .class definition defines different things depending on circumstance.

    • If the interface semantic attribute is specified, the class definition defines an interface.
    • If the interface semantic attribute is not specified, and the definition does not extend System.ValueType, the class definition defines an object type (class).
    • If the interface semantic attribute is not specified, and the definition does extend System.ValueType, the class definition defines a value type and its corresponding boxed type (struct).

    Memory layout

    This section assumes a 32-bit process

    As already mentioned, value types do not have type information, and thus it's not possible to identify what a value type represents from its memory location. A struct describes a simple data type, and contains just the fields it defines:

    public struct MyStruct
    {
        public int A;
        public short B;
        public int C;
    }
    

    If we imagine that an instance of MyStruct was allocated at address 0x1000, then this is the memory layout:

    0x1000: int A;
    0x1004: short B;
    0x1006: 2 byte padding
    0x1008: int C;
    

    Structs default to sequential layout. Fields are aligned on boundaries of their own size. Padding is added to satisfy this.

     

    If we define a class in the exact same way, as:

    public class MyClass
    {
        public int A;
        public short B;
        public int C;
    }
    

    Imagining the same address, the memory layout is as follows:

    0x1000: Pointer to object header
    0x1004: int A;
    0x1008: int C;
    0x100C: short B;
    0x100E: 2 byte padding
    0x1010: 4 bytes extra
    

    Classes default to automatic layout, and the JIT compiler will arrange them in the most optimal order. Fields are aligned on boundaries of their own size. Padding is added to satisfy this. I'm not sure why, but every class always has an additional 4 bytes at the end.

    Offset 0 contains the address of the object header, which contains type information, the virtual method table, etc. This allows the runtime to identify what the data at an address represents, unlike value types.

    Thus, value types do not support inheritance, interfaces nor polymorphism.

    Methods

    Value types do not have virtual method tables, and thus do not support polymorphism. However, their corresponding boxed type does.

    When you have an instance of a struct and attempt to call a virtual method like ToString() defined on System.Object, the runtime has to box the struct.

    MyStruct myStruct = new MyStruct();
    Console.WriteLine(myStruct.ToString()); // ToString() call causes boxing of MyStruct.
    

    However, if the struct overrides ToString() then the call will be statically bound and the runtime will call MyStruct.ToString() without boxing and without looking in any virtual method tables (structs don't have any). For this reason, it's also able to inline the ToString() call.

    If the struct overrides ToString() and is boxed, then the call will be resolved using the virtual method table.

    System.ValueType myStruct = new MyStruct(); // Creates a new instance of the boxed type of MyStruct.
    Console.WriteLine(myStruct.ToString()); // ToString() is now called through the virtual method table.
    

    However, remember that ToString() is defined in the struct, and thus operates on the struct value, so it expects a value type. The boxed type, like any other class, has an object header. If the ToString() method defined on the struct was called directly with the boxed type in the this pointer, when trying to access field A in MyStruct, it would access offset 0, which in the boxed type would be the object header pointer. So the boxed type has a hidden method that does the actual overriding of ToString(). This hidden method unboxes (address calculation only, like the unbox IL instruction) the boxed type then statically calls the ToString() defined on the struct.

    Likewise, the boxed type has a hidden method for each implemented interface method that does the same unboxing then statically calls the method defined in the struct.

     

    CLI specification

    Boxing

    I.8.2.4 For every value type, the CTS defines a corresponding reference type called the boxed type. The reverse is not true: In general, reference types do not have a corresponding value type. The representation of a value of a boxed type (a boxed value) is a location where a value of the value type can be stored. A boxed type is an object type and a boxed value is an object.

    Defining value types

    I.8.9.7 Not all types defined by a class definition are object types (see §I.8.2.3); in particular, value types are not object types, but they are defined using a class definition. A class definition for a value type defines both the (unboxed) value type and the associated boxed type (see §I.8.2.4). The members of the class definition define the representation of both.

    II.10.1.3 The type semantic attributes specify whether an interface, class, or value type shall be defined. The interface attribute specifies an interface. If this attribute is not present and the definition extends (directly or indirectly) System.ValueType, and the definition is not for System.Enum, a value type shall be defined (§II.13). Otherwise, a class shall be defined (§II.11).

    Value types do not inherit

    I.8.9.10 In their unboxed form value types do not inherit from any type. Boxed value types shall inherit directly from System.ValueType unless they are enumerations, in which case, they shall inherit from System.Enum. Boxed value types shall be sealed.

    II.13 Unboxed value types are not considered subtypes of another type and it is not valid to use the isinst instruction (see Partition III) on unboxed value types. The isinst instruction can be used for boxed value types, however.

    I.8.9.10 A value type does not inherit; rather the base type specified in the class definition defines the base type of the boxed type.

    Value types do not implement interfaces

    I.8.9.7 Value types do not support interface contracts, but their associated boxed types do.

    II.13 Value types shall implement zero or more interfaces, but this has meaning only in their boxed form (§II.13.3).

    I.8.2.4 Interfaces and inheritance are defined only on reference types. Thus, while a value type definition (§I.8.9.7) can specify both interfaces that shall be implemented by the value type and the class (System.ValueType or System.Enum) from which it inherits, these apply only to boxed values.

    The non-existent boxed keyword

    II.13.1 The unboxed form of a value type shall be referred to by using the valuetype keyword followed by a type reference. The boxed form of a value type shall be referred to by using the boxed keyword followed by a type reference.

    Note: The specification is wrong here, there is no boxed keyword.

    Epilogue

    I think part of the confusion of how value types seem to inherit, stems from the fact that C# uses casting syntax to perform boxing and unboxing, which makes it seem like you're performing casts, which is not really the case (although, the CLR will throw an InvalidCastException if attempting to unbox the wrong type). (object)myStruct in C# creates a new instance of the boxed type of the value type; it does not perform any casts. Likewise, (MyStruct)obj in C# unboxes a boxed type, copying the value part out; it does not perform any casts.

    0 讨论(0)
  • 2020-11-22 17:56

    C# doesn't allow structs to derive from classes

    Your statement is incorrect, hence your confusion. C# does allow structs to derive from classes. All structs derive from the same class, System.ValueType, which derives from System.Object. And all enums derive from System.Enum.

    UPDATE: There has been some confusion in some (now deleted) comments, which warrants clarification. I'll ask some additional questions:

    Do structs derive from a base type?

    Plainly yes. We can see this by reading the first page of the specification:

    All C# types, including primitive types such as int and double, inherit from a single root object type.

    Now, I note that the specification overstates the case here. Pointer types do not derive from object, and the derivation relationship for interface types and type parameter types is more complex than this sketch indicates. However, plainly it is the case that all struct types derive from a base type.

    Are there other ways that we know that struct types derive from a base type?

    Sure. A struct type can override ToString. What is it overriding, if not a virtual method of its base type? Therefore it must have a base type. That base type is a class.

    May I derive a user-defined struct from a class of my choice?

    Plainly no. This does not imply that structs do not derive from a class. Structs derive from a class, and thereby inherit the heritable members of that class. In fact, structs are required to derive from a specific class: Enums are required to derive from Enum, structs are required to derive from ValueType. Because these are required, the C# language forbids you from stating the derivation relationship in code.

    Why forbid it?

    When a relationship is required, the language designer has options: (1) require the user to type the required incantation, (2) make it optional, or (3) forbid it. Each has pros and cons, and the C# language designers have chosen differently depending on the specific details of each.

    For example, const fields are required to be static, but it is forbidden to say that they are because doing so is first, pointless verbiage, and second, implies that there are non-static const fields. But overloaded operators are required to be marked as static, even though the developer has no choice; it is too easy for developers to believe that an operator overload is an instance method otherwise. This overrides the concern that a user may come to believe that the "static" implies that, say "virtual" is also a possibility.

    In this case, requiring a user to say that their struct derives from ValueType seems like mere excess verbiage, and it implies that the struct could derive from another type. To eliminate both these problems, C# makes it illegal to state in the code that a struct derives from a base type, though plainly it does.

    Similarly all delegate types derive from MulticastDelegate, but C# requires you to not say that.

    So, now we have established that all structs in C# derive from a class.

    What is the relationship between inheritance and derivation from a class?

    Many people are confused by the inheritance relationship in C#. The inheritance relationship is quite straightforward: if a struct, class or delegate type D derives from a class type B then the heritable members of B are also members of D. It's as simple as that.

    What does it mean with regards to inheritance when we say that a struct derives from ValueType? Simply that all the heritable members of ValueType are also members of the struct. This is how structs obtain their implementation of ToString, for example; it is inherited from the base class of the struct.

    All heritable members? Surely not. Are private members heritable?

    Yes. All private members of a base class are also members of the derived type. It is illegal to call those members by name of course if the call site is not in the accessibility domain of the member. Just because you have a member does not mean you can use it!

    We now continue with the original answer:


    How does the CLR handle this?

    Extremely well. :-)

    What makes a value type a value type is that its instances are copied by value. What makes a reference type a reference type is that its instances are copied by reference. You seem to have some belief that the inheritance relationship between value types and reference types is somehow special and unusual, but I don't understand what that belief is. Inheritance has nothing to do with how things are copied.

    Look at it this way. Suppose I told you the following facts:

    • There are two kinds of boxes, red boxes and blue boxes.

    • Every red box is empty.

    • There are three special blue boxes called O, V and E.

    • O is not inside any box.

    • V is inside O.

    • E is inside V.

    • No other blue box is inside V.

    • No blue box is inside E.

    • Every red box is in either V or E.

    • Every blue box other than O is itself inside a blue box.

    The blue boxes are reference types, the red boxes are value types, O is System.Object, V is System.ValueType, E is System.Enum, and the "inside" relationship is "derives from".

    That's a perfectly consistent and straightforward set of rules which you could easily implement yourself, if you had a lot of cardboard and a lot of patience. Whether a box is red or blue has nothing to do with what it's inside; in the real world it is perfectly possible to put a red box inside a blue box. In the CLR, it is perfectly legal to make a value type that inherits from a reference type, so long as it is either System.ValueType or System.Enum.

    So let's rephrase your question:

    How do ValueTypes derive from Object (ReferenceType) and still be ValueTypes?

    as

    How is it possible that every red box (value types) is inside (derives from) box O (System.Object), which is a blue box (a reference Type) and still be a red box (a value type)?

    When you phrase it like that, I hope it's obvious. There's nothing stopping you from putting a red box inside box V, which is inside box O, which is blue. Why would there be?


    AN ADDITIONAL UPDATE:

    Joan's original question was about how it is possible that a value type derives from a reference type. My original answer did not really explain any of the mechanisms that the CLR uses to account for the fact that we have a derivation relationship between two things that have completely different representations -- namely, whether the referred-to data has an object header, a sync block, whether it owns its own storage for the purposes of garbage collection, and so on. These mechanisms are complicated, too complicated to explain in one answer. The rules of the CLR type system are quite a bit more complex than the somewhat simplified flavour of it that we see in C#, where there is not a strong distinction made between the boxed and unboxed versions of a type, for example. The introduction of generics also caused a great deal of additional complexity to be added to the CLR. Consult the CLI specification for details, paying particular attention to the rules for boxing and constrained virtual calls.

    0 讨论(0)
  • 2020-11-22 18:05

    Your statement is incorrect, hence your confusion. C# does allow structs to derive from classes. All structs derive from the same class, System.ValueType

    So let's try this:

     struct MyStruct :  System.ValueType
     {
     }
    

    This will not even compile. Compiler will remind you "Type 'System.ValueType' in interface list is not an interface".

    When decompile Int32 which is a struct, you will find :

    public struct Int32 : IComparable, IFormattable, IConvertible {}, not mentionning it is derived from System.ValueType. But in object browser, you do find Int32 does inherit from System.ValueType.

    So all these lead me to believe:

    I think the best way to answer this is that ValueType is special. It is essentially the base class for all value types in the CLR type system. It's hard to know how to answer "how does the CLR handles this" because it's simply a rule of the CLR.

    0 讨论(0)
  • 2020-11-22 18:07

    Small correction, C# doesn't allow structs to custom derive from anything, not just classes. All a struct can do is implement an interface which is very different from derivation.

    I think the best way to answer this is that ValueType is special. It is essentially the base class for all value types in the CLR type system. It's hard to know how to answer "how does the CLR handles this" because it's simply a rule of the CLR.

    0 讨论(0)
  • 2020-11-22 18:08

    A boxed value type is effectively a reference type (it walks like one and quacks like one, so effectively it is one). I would suggest that ValueType isn't really the base type of value types, but rather is the base reference type to which value types can be converted when cast to type Object. Non-boxed value types themselves are outside the object hierarchy.

    0 讨论(0)
提交回复
热议问题