Representing dynamic typing in C

前端未结

关注

 6  1541

I\'m writing a dynamically-typed language. Currently, my objects are represented in this way:

struct Class { struct Class* class; struct Object* (*get)(stru


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2020-12-14 22:40
              
            
            
                                                                       
See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C.  The Python solution can be directly applied to your problem.  Essentially you want to do this:

struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };


You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-12-14 22:50
              
            
            
                                                                       
Section 6.2.5 of ISO 9899:1999 (the C99 standard) says:

A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.

Section 6.7.2.1 also says:

As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[...]
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.

This guarantees what you need.
In the question you say:

The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works.

This will work on all platforms.  It also means that your first alternative - what you are currently using - is safe enough.

But on another platform struct StringInteger might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.

No compliant compiler is allowed to do that.  [I replaced String by Integer assuming you were referring to the first set of declarations.  On closer examination, you might have been referring to the structure with an embedded union.  The compiler still isn't allowed to reorder class and value.]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-12-14 22:52
              
            
            
                                                                       
C gives you sufficient guarantees that your first approach will work.  The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:

union allow_aliasing {
    struct Class class;
    struct Object object;
    struct Integer integer;
    struct String string;
};


(You don't need to ever use the union for anything - it just has to be in scope)

I believe the relevant part of the standard is this:


  [#5] With one exception, if the value
  of a member of a union object is used
  when the most recent store to the
  object was to a different member, the
  behavior is implementation-defined.
  One special guarantee is made in order
  to simplify the use of unions: If a
  union contains several structures that
  share a common initial sequence (see
  below), and if the union object
  currently contains one of these
  structures, it is permitted to inspect
  the common initial part of any of them
  anywhere that a declaration of the
  completed type of the union is
  visible. Two structures share a common
  initial sequence if corresponding
  members have compatible types (and,
  for bit-fields, the same widths) for a
  sequence of one or more initial
  members.


(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2020-12-14 23:05
              
            
            
                                                                       
There are 3 major approaches for implementing dynamic types and which one is best depends on the situation.

1) C-style inheritance: The first one is shown in Josh Haberman's answer. We create a type-hierarchy using classic C-style inheritance:

struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };


Functions with dynamically typed arguments receive them as Object*, inspect the class member, and cast as appropriate. The cost to check the type is two pointer hops. The cost to get the underlying value is one pointer hop. In approaches like this one, objects are typically allocated on the heap since the size of objects is unknown at compile time. Since most `malloc implementations allocate a minimum of 32 bytes at a time, small objects can waste a significant amount of memory with this approach.

2) Tagged union: We can remove a level of indirection for accessing small objects using the "short string optimization"/"small object optimization":

struct Object {
    struct Class* class;
    union {
        // fundamental C types or other small types of interest
        bool as_bool;
        int as_int;
        // [...]
        // object pointer for large types (or actual pointer values)
        void* as_ptr;
    };
};


Functions with dynamically typed arguments receive them as Object, inspect the class member, and read the union as appropriate. The cost to check the type is one pointer hop. If the type is one of the special small types, it is stored directly in the union, and there is no indirection to retrieve the value. Otherwise, one pointer hop is required to retrieve the value. This approach can sometimes avoid allocating objects on the heap. Although the exact size of an object still isn't known at compile time, we now know the size and alignment (our union) needed to accommodate small objects.

In these first two solutions, if we know all the possible types at compile time, we can encode the type using an integer type instead of a pointer and reduce type check indirection by one pointer hop.

3) Nan-boxing: Finally, there's nan-boxing where every object handle is only 64 bits.

double object;


Any value corresponding to a non-NaN double is understood to simply be a double. All other object handles are a NaN. There are actually large swaths of bit values of double precision floats that correspond to NaN in the commonly used IEEE-754 floating point standard. In the space of NaNs, we use a few bits to tag types and the remaining bits for data. By taking advantage of the fact that most 64-bit machines actually only have a 48-bit address space, we can even stash pointers in NaNs. This method incurs no indirection or extra memory use but constrains our small object types, is awkward, and in theory is not portable C.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2020-12-14 23:07
              
            
            
                                                                       

  The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.


I believe you're wrong here. First, because your struct String doesn't have a value member. Second, because I believe C does guarantee the layout in memory of your struct's members. That's why the following are different sizes:

struct {
    short a;
    char  b;
    char  c;
}

struct {
    char  a;
    short b;
    char  c;
}


If C made no guarantees, then compilers would probably optimize both of those to be the same size. But it guarantees the internal layout of your structs, so the natural alignment rules kick in and make the second one larger than the first.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-14 23:07
              
            
            
                                                                       
I appreciate the pedantic issues raised by this question and answers, but I just wanted to mention that CPython has used similar tricks "more or less forever" and it's been working for decades across a huge variety of C compilers. Specifically, see object.h, macros like PyObject_HEAD, structs like PyObject: all kinds of Python Objects (down at the C API level) are getting pointers to them forever cast back and forth to/from PyObject* with no harm done. It's been a while since I last played sea lawyer with an ISO C Standard, to the point that I don't have a copy handy (!), but I do believe that there are some constraints there that should make this keep working as it has for nearly 20 years...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复