What is a generic list manipulation function in C? (I saw this when I was going through some materials.)
What is the difference between this function and a function
The Linux kernel has an interesting implementation of a generic linked list in C on its linux/list.h header. It is a doubly-linked list with a head node, used like this:
struct mystruct {
...
/* Contains the next and prev pointers */
struct list_head mylist;
...
/* A single struct can be in several lists */
struct list_head another_list;
...
};
struct list_head mylist_head;
struct list_head another_list_head;
Some interesting things in this small example:
struct list_head
, not to the target structure (on the example above, they point to &(foo->mylist)
for the first list and &(foo->another_list)
for the second list).All the list manipulation functions take pointers to struct list_head
(and most do not care at all whether it is the separate head node or one of the embedded nodes). To get from the struct list_head
to the target struct, you use the list_entry
macro (which is the same as the containter_of
macro from the linux/kernel.h header), which expands into a simple pointer subtraction.
Since it is a doubly-linked list with a head node, you can in O(1)
:
For my teachings I came to develop this "generic" list module, probably a simplify version of the linux kernel one, with additional though undiscovered bugs included, and that uses gcc extensions... Any comments are welcomed !
#ifndef _LISTE
#define _LISTE
#include <stdlib.h>
typedef struct liste_s {
struct liste_s * suivant ;
} * liste ;
#define newl(t) ( (liste) malloc ( sizeof ( struct liste_s ) + sizeof ( t ) ) )
#define elt(l,t) ( * ( ( t * ) ( l + 1 ) ) )
#define liste_vide NULL
#define videp(l) ( l == liste_vide )
#define lvide() liste_vide
#define cons(e,l) \
({ liste res = newl(typeof(e)) ; \
res->suivant = l ; \
elt(res,typeof(e)) = e ; \
res ; })
#define hd(l,t) ({ liste res = l ; if ( videp(res) ) exit ( EXIT_FAILURE ) ; elt(res,t) ; })
#define tl(l) ({ liste res = l ; if ( videp(res) ) exit ( EXIT_FAILURE ) ; res->suivant ;})
#endif
C has no concept of "generic" pointers or objects - the closest you can get is using a void*
pointer. If you want one piece of code to be able to handle any data type, you pretty much have to use void*
pointers. For data types with sizes no larger than a pointer, you can cast between the type and void*
; for larger data types, you'll have to use dynamic memory and have the void*
member point to the dynamic memory. Just watch out for memory leaks!
typedef struct list_node {
struct list_node *next;
void *data;
} list_node;
void list_insert(list_node *node, void *data) {
// ...
}
On the other hand, if you want to generate code for each possible data type, you'll have to do it with macros, and then instantiate the macros for each data type you might use. For example:
#define DEFINE_LIST(type) \
typedef struct list_node_##type { \
struct list_node_##type *next; \
type data; \
}
#define IMPLEMENT_LIST_INSERT(type) \
void list_##type##_insert(list_node_##type *node, type data) { \
... \
}
DEFINE_LIST(int); // defines struct list_node_int
DEFINE_LIST(double); // defines struct list_node_double
IMPLEMENT_LIST_INSERT(int); // defines list_int_insert
IMPLEMENT_LIST_INSERT(double); // defines list_double_insert
A generic list is likely to be singly-linked, and probably assumes that the items in the list have a structure like this:
typedef struct list_item list_item;
struct list_item
{
list_item *next;
...data for node...
};
Using this layout, you can write functions to manipulate lists using just the next pointers.
Sometimes, the '...data for node...
' will be a simple 'void *
'; that is, the list items will contain pointers to the next node in the list (or NULL if there is no next node) and pointers to the data.
typedef struct list list;
struct list
{
list *next;
void *data;
};
Since you can cast any pointer to 'void *
', you can have any mix of data types in the list - but your code must know how to handle them.
You ask about 'a' generic list function, but there probably isn't a single one-function-does-all design, and certainly not a simple one. There are a number of possible sets of functions that could make generic list functions. One set, inspired by Lisp, would consist of:
void *car(list *lp); // Return the data for the first item on the list
list *cdr(list *lp); // Return the tail of the list
list *cons(list *lp1, list *lp2); // Construct a list from lists lp1 and lp2
list *cond(list *lp, void *data); // Append data item to list
You probably want to provide the ability to test whether the list is empty, and a few other items.
One good exposition, admittedly in C++, is found in Koenig's "Ruminations on C++". The ideas can be adapted into C quite easily - it isn't dreadfully hard (though the storage management in C is harder than in C++).
As mentioned above, I tried using MACROS approach to create the list manipulation functions. Its easy to create the INSERT operation routine but difficult to create Delete and traverse operations. Following it the list structure and the INSERT routine signature:
#define LIST_DEFINE(type) \
struct list_node_##type \
{ \
type *data; \`
struct list_node_##type *next; \
};
LIST_INSERT(&ListHead,&Data, DataType);
Where:
ListHead
- Head of the linked list
Data
- The Data for which a new node will be created and data is inserted in the node
DataType
- Is the data-type of the data passed
FYI, I am allocating memory in the function and copying all the data passed in the newly created node and them append the node in linked list.
Now, when a LIST_DELETE
routine is created, the node needs to be deleted will be identified using a unique identifier within the data. That identifier is also passed in the MACRO
routine as key that will be replaced in the MACRO
expansion. The routine signature could be:
LIST_DELETE(&ListHead, DataType, myvar->data->str, char*);
Where:
ListHead
- Head of the linked list
DataType
- Is the data-type of the data
myvar->data->str
- Unique key
char*
- Key type
Now, when the key is expanded, that same key cannot be used for comparison as if we write
if((keytype)ListHead->data->key == (keytype)key)
It expands to
ListHead->data->myvar->data->str == myvar->data->str
And here there is no variable like: ListHead->data->myvar->data->str
So, this approach cannot work to write delete routines and as the traversal and search routines also use unique key, same problem will be faced in them as well.
And, on an unrelated note, how to determine the matching logic for unique key, as the unique key could be anything.
I have been trying something different. This is another perspective how to board the problem
If we have the follow structure:
typedef struct token {
int id;
char *name;
struct token *next;
} Token;
and we need to create a function that returns the tail of a linked list, but the function should be generic for any linked list, so:
void* tail(void* list, void* (*f)(void *)) {
void *head = list;
while(f(head) != NULL) {
head = f(head);
}
return head;
}
Now will be necessary create a function responsible to do the bridge between the our custom struct to a generic usability in the tail function. In that way, we have:
void* nextToken(void *a) {
Token *t = (Token *) t;
return (void *) (a->next);
}
Finally we can simply use:
Token *listTokens;
(...)
Token *lastToken = tail(listTokens, nextToken);