Type-safe generic data structures in plain-old C?

后端 未结 10 2125
庸人自扰
庸人自扰 2020-12-04 08:23

I have done far more C++ programming than \"plain old C\" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are p

相关标签:
10条回答
  • 2020-12-04 08:43

    An old question, I know, but in case it is still of interest: I was experimenting with option 2) (pre-processor macros) today, and came up with the example I will paste below. Slightly clunky indeed, but not terrible. The code is not fully type safe, but contains sanity checks to provide a reasonable level of safety. And dealing with the compiler error messages while writing it was mild compared to what I have seen when C++ templates came into play. You are probably best starting reading this at the example use code in the "main" function.

    #include <stdio.h>
    
    #define LIST_ELEMENT(type) \
        struct \
        { \
            void *pvNext; \
            type value; \
        }
    
    #define ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement) \
        do { \
            (void)(&(pElement)->value  == (type *)&(pElement)->value); \
            (void)(sizeof(*(pElement)) == sizeof(LIST_ELEMENT(type))); \
        } while(0)
    
    #define SET_POINTER_TO_LIST_ELEMENT(type, pDest, pSource) \
        do { \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
            void **pvDest = (void **)&(pDest); \
            *pvDest = ((void *)(pSource)); \
        } while(0)
    
    #define LINK_LIST_ELEMENT(type, pDest, pSource) \
        do { \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
            (pDest)->pvNext = ((void *)(pSource)); \
        } while(0)
    
    #define TERMINATE_LIST_AT_ELEMENT(type, pDest) \
        do { \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
            (pDest)->pvNext = NULL; \
        } while(0)
    
    #define ADVANCE_POINTER_TO_LIST_ELEMENT(type, pElement) \
        do { \
            ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement); \
            void **pvElement = (void **)&(pElement); \
            *pvElement = (pElement)->pvNext; \
        } while(0)
    
    typedef struct { int a; int b; } mytype;
    
    int main(int argc, char **argv)
    {
        LIST_ELEMENT(mytype) el1;
        LIST_ELEMENT(mytype) el2;
        LIST_ELEMENT(mytype) *pEl;
        el1.value.a = 1;
        el1.value.b = 2;
        el2.value.a = 3;
        el2.value.b = 4;
        LINK_LIST_ELEMENT(mytype, &el1, &el2);
        TERMINATE_LIST_AT_ELEMENT(mytype, &el2);
        printf("Testing.\n");
        SET_POINTER_TO_LIST_ELEMENT(mytype, pEl, &el1);
        if (pEl->value.a != 1)
            printf("pEl->value.a != 1: %d.\n", pEl->value.a);
        ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
        if (pEl->value.a != 3)
            printf("pEl->value.a != 3: %d.\n", pEl->value.a);
        ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
        if (pEl != NULL)
            printf("pEl != NULL.\n");
        printf("Done.\n");
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-04 08:46

    C has a different kind of beauty to it than C++, and type safety and being able to always see what everything is when tracing through code without involving casts in your debugger is typically not one of them.

    C's beauty comes a lot from its lack of type safety, of working around the type system and at the raw level of bits and bytes. Because of that, there's certain things it can do more easily without fighting against the language like, say, variable-length structs, using the stack even for arrays whose sizes are determined at runtime, etc. It also tends to be a lot simpler to preserve ABI when you're working at this lower level.

    So there's a different kind of aesthetic involved here as well as different challenges, and I'd recommend a shift in mindset when you work in C. To really appreciate it, I'd suggest doing things many people take for granted these days, like implementing your own memory allocator or device driver. When you're working at such a low level, you can't help but look at everything as memory layouts of bits and bytes as opposed to 'objects' with behaviors attached. Furthermore, there can come a point in such low-level bit/byte manipulation code where C becomes easier to comprehend than C++ code littered with reinterpret_casts, e.g.

    As for your linked list example, I would suggest a non-intrusive version of a linked node (one that does not require storing list pointers into the element type, T, itself, allowing the linked list logic and representation to be decoupled from T itself), like so:

    struct ListNode
    {
        struct ListNode* prev;
        struct ListNode* next;
        MAX_ALIGN char element[1]; // Watch out for alignment here.
                                   // see your compiler's specific info on 
                                   // aligning data members.
    };
    

    Now we can create a list node like so:

    struct ListNode* list_new_node(int element_size)
    {
        // Watch out for alignment here.
        return malloc_max_aligned(sizeof(struct ListNode) + element_size - 1);
    }
    
    // create a list node for 'struct Foo'
    void foo_init(struct Foo*);
    struct ListNode* foo_node = list_new_node(sizeof(struct Foo));
    foo_init(foo_node->element);
    

    To retrieve the element from the list as T*:

    T* element = list_node->element;
    

    Since it's C, there's no type checking whatsoever when casting pointers in this way, and that will probably also give you an uneasy feeling if you're coming from a C++ background.

    The tricky part here is to make sure that this member, element, is properly aligned for whatever type you want to store. When you can solve that problem as portably as you need it to be, you'll have a powerful solution for creating efficient memory layouts and allocators. Often this will have you just using max alignment for everything which might seem wasteful, but typically isn't if you are using appropriate data structures and allocators which aren't paying this overhead for numerous small elements on an individual basis.

    Now this solution still involves the type casting. There's little you can do about that short of having a separate version of code of this list node and the corresponding logic to work with it for every type, T, that you want to support (short of dynamic polymorphism). However, it does not involve an additional level of indirection as you might have thought was needed, and still allocates the entire list node and element in a single allocation.

    And I would recommend this simple way to achieve genericity in C in many cases. Simply replace T with a buffer that has a length matching sizeof(T) and aligned properly. If you have a reasonably portable and safe way you can generalize to ensure proper alignment, you'll have a very powerful way of working with memory in a way that often improves cache hits, reduces the frequency of heap allocations/deallocations, the amount of indirection required, build times, etc.

    If you need more automation like having list_new_node automatically initialize struct Foo, I would recommend creating a general type table struct that you can pass around which contains information like how big T is, a function pointer pointing to a function to create a default instance of T, another to copy T, clone T, destroy T, a comparator, etc. In C++, you can generate this table automatically using templates and built-in language concepts like copy constructors and destructors. C requires a bit more manual effort, but you can still reduce it the boilerplate a bit with macros.

    Another trick that can be useful if you go with a more macro-oriented code generation route is to cash in a prefix or suffix-based naming convention of identifiers. For example, CLONE(Type, ptr) could be defined to return Type##Clone(ptr), so CLONE(Foo, foo) could invoke FooClone(foo). This is kind of a cheat to get something akin to function overloading in C, and is useful when generating code in bulk (when CLONE is used to implement another macro) or even a bit of copying and pasting of boilerplate-type code to at least improve the uniformity of the boilerplate.

    0 讨论(0)
  • 2020-12-04 08:47

    Option 1 is the approach taken by most C implementations of generic containers that I see. The Windows driver kit and the Linux kernel use a macro to allow links for the containers to be embedded anywhere in a structure, with the macro used to obtain the structure pointer from a pointer to the link field:

    • list_entry() macro in Linux
    • CONTAINING_RECORD() macro in Windows

    Option 2 is the tack taken by BSD's tree.h and queue.h container implementation:

    • http://openbsd.su/src/sys/sys/queue.h
    • http://openbsd.su/src/sys/sys/tree.h

    I don't think I'd consider either of these approaches type safe. Useful, but not type safe.

    0 讨论(0)
  • 2020-12-04 08:48

    I am using option 2 for a couple of high performance collections, and it is extremely time-consuming working through the amount of macro logic needed to do anything truly compile-time generic and worth using. I am doing this purely for raw performance (games). An X-macros approach is used.

    A painful issue that constantly comes up with Option 2 is, "Assuming some finite number of options, such as 8/16/32/64 bit keys, do I make said value a constant and define several functions each with a different element of this set of values that constant can take on, or do I just make it a member variable?" The former means a less performant instruction cache since you have a lot of repeated functions with just one or two numbers different, while the latter means you have to reference allocated variables which in the worst case means a data cache miss. Since Option 1 is purely dynamic, you will make such values member variables without even thinking about it. This truly is micro-optimisation, though.

    Also bear in mind the trade-off between returning pointers vs. values: the latter is most performant when the size of the data item is less than or equal to pointer size; whereas if the data item is larger, it is most likely better to return pointers than to force a copy of a large object by returning value.

    I would strongly suggest going for Option 1 in any scenario where you are not 100% certain that collection performance will be your bottleneck. Even with my use of Option 2, my collections library supplies a "quick setup" which is like Option 1, i.e. use of void * values in my list and map. This is sufficient for 90+% of circumstances.

    0 讨论(0)
  • 2020-12-04 08:50

    There's a common variation to option 1 which is more efficient as it uses unions to store the values in the list nodes, ie there's no additional indirection. This has the downside that the list only accepts values of certain types and potentially wastes some memory if the types are of different sizes.

    However, it's possible to get rid of the union by using flexible array member instead if you're willing to break strict aliasing. C99 example code:

    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    struct ll_node
    {
        struct ll_node *next;
        long long data[]; // use `long long` for alignment
    };
    
    extern struct ll_node *ll_unshift(
        struct ll_node *head, size_t size, void *value);
    
    extern void *ll_get(struct ll_node *head, size_t index);
    
    #define ll_unshift_value(LIST, TYPE, ...) \
        ll_unshift((LIST), sizeof (TYPE), &(TYPE){ __VA_ARGS__ })
    
    #define ll_get_value(LIST, INDEX, TYPE) \
        (*(TYPE *)ll_get((LIST), (INDEX)))
    
    struct ll_node *ll_unshift(struct ll_node *head, size_t size, void *value)
    {
        struct ll_node *node = malloc(sizeof *node + size);
        if(!node) assert(!"PANIC");
    
        memcpy(node->data, value, size);
        node->next = head;
    
        return node;
    }
    
    void *ll_get(struct ll_node *head, size_t index)
    {
        struct ll_node *current = head;
        while(current && index--)
            current = current->next;
        return current ? current->data : NULL;
    }
    
    int main(void)
    {
        struct ll_node *head = NULL;
        head = ll_unshift_value(head, int, 1);
        head = ll_unshift_value(head, int, 2);
        head = ll_unshift_value(head, int, 3);
    
        printf("%i\n", ll_get_value(head, 0, int));
        printf("%i\n", ll_get_value(head, 1, int));
        printf("%i\n", ll_get_value(head, 2, int));
    
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-04 08:52

    GLib is has a bunch of generic data structures in it, http://www.gtk.org/

    CCAN has a bunch of useful snippets and such http://ccan.ozlabs.org/

    0 讨论(0)
提交回复
热议问题