I am trying to design a program that takes in data from a file, after which it gives numbering to unique data, linked list also contains parent and child lists.
Data
Is it possible to create linked list with nodes with more than one next or more than one previous nodes, if so how would the struct look like?
Yes it is possible -- the question you must ask yourself is "how do I store an aribitrarily large amount of data?", the brief answer being "you must use an ADT". Recall that an ADT is a mathematical model for a collection of data.
You can implement it with any ADT, the choice of the specific ADT depends on the operations you plan to use most frequently. For my example, I will use a dynamic array. The structure would be declared as follows (omitting the specific fields for the node):
struct llnode {
int item;
struct llnode *children;
int length;
int capacity;
};
... where the item is the ASCII code for 'A', 'B', 'C', etc. and children is a pointer to an array of struct llnodes. You can however create a separate structure for a dynamic array to be less messy however it is entirely up to you. The same idea would apply to the parent nodes.
What you are describing is a Graph.
A (double) Linked list is really just a one dimensional list and is an inappropriate term for what you want.
There are two main ways of implementing a graph:
n
times n
matrix (where n
is the amount of nodes/vertices) with an entry at [a][b]
if node a
has an edge to b
.Which of these to use depends on your use case. As a rule of thumb: If you have many many vertices (tens of thousands) and you can on average cap the amount of edges per vertex with a constant then you should use lists. In the other use cases you should be better off with a matrix (mainly because of ease of implementation).
I assume that your use case is limited to ASCII letters, so I would actually use a matrix here. With proper optimisations (bitfields and the likes) you can browse it very quickly.
Your implementation could look like:
char adj_matrix[0x80][0x80]; // I am assuming that you only have ASCII letters
memset(adj_matrix, 0, sizeof(adj_matrix)); // initialise empty
Inserting elements would go like:
adj_matrix['A']['C'] = 1; // edge from A --> C
To determine all incoming edges for 'A' you would have to iterate though the matrix:
for (i = 'A'; i <= 'Z'; i++)
if (adj_matrix[i]['A'])
// A has an incoming edge from i
for outgoing the other way round
for (i = 'A'; i <= 'Z'; i++)
if (adj_matrix['E'][i])
// E has an outgoing edge to i
As said, you can significantly up both space and time performance with the use of bitfields and bitscan instructions (e.g. gcc __builtin_clzll
, icc _bit_scan_reverse
).
You can have structure like this:
struct abcd{
char data;
struct abcd *next[10]; //array of next nodes
struct abcd *prev[10]; //array of previous nodes
}
When accessing next nodes you can do node->next[i]
instead of node->next
, where 0<= i < 10
. When allocating/creating node reset all array elements to NULL
so that you don't have garbage for uninitialized nodes.
So lets suppose you added node for 'A'
, then you can add nodes for 'B'
and 'C'
as
int idx;
//find index for free element.
for(idx = 0; nodeA->next[idx] && idx < 10; idx++)
;
if(idx == 10)
//dont have free space
nodeA->next[idx] = nodeB;
nodeB->prev[0] = nodeA;
//similarly add for C, you may have to check for appropriate idx!
nodeA->next[idx++]] = nodeC;
nodeC->prev[0] = nodeA;
With this basically you can create node which can have at most 10 next or previous nodes.
Array is for simplicity, you can also do struct abcd **next;
where you can have dynamic number of next/prev nodes. You will have to allocate the memory appropriately though.
Yes, so this is called a directed graph. And there are about a thousand ways you could implement it. The "right" way depends entirely on how you will use it, which you haven't described. Since you did seem to limit this to linked lists or doubly linked lists I'll just use nothing but doubly linked lists.
Forward declare your data types
typedef struct ItemInfo_s ItemInfo;
typedef struct DoubleLinkedListNode_s DoubleLinkedListNode;
Create a ListNode like you always do:
struct DoubleLinkedListNode_s {
DoubleLinkedListNode *next;
DoubleLinkedListNode *prev;
ItemInfo *data;
};
Then create your ItemInfo:
struct ItemInfo_s {
DoubleLinkedListNode *children;
DoubleLinkedListNode *parents;
... /* other item data */
};
Also, for sanity's sake create a list of all created nodes:
DoubleLinkedListNode *items;
Now, I'm not going to write all of the linked list management functions, but I'm sure you can figure it out. By convention I'll write (B) as a node pointing to item B (node.data = &B). I'll also indicate any two nodes linked together with an '=', and a '-' as an unlinked (null valued) node linkage. I'll write a chain of elements [ -(1)=(2)=(3)- ] and by convention pointers into a chain of items will always point to the first node in the chain (the (1) in this example). Your given graph looks like this in memory:
items = [ -(A)=(B)=(C)=(E)=(F)=(G)=(I)=(J)=(K)- ]
A.children = [ -(B)=(C)- ]
A.parents = []
B.children = [ -(E)- ]
B.parents = [ -(A)- ]
C.children = [ -(E)=(G)- ]
C.parents = [ -(A)- ]
E.children = [ -(I)=(F)- ]
E.parents = [ -(B)=(C)- ]
F.children = [ -(J)- ]
F.parents = [ -(E)- ]
G.children = [ -(K)- ]
G.parents = [ -(C)- ]
I.children = []
I.parents = [ -(E)- ]
J.children = []
J.parents = [ -(F)- ]
K.children = []
K.parents = [ -(G)- ]
In total that is 9 ItemInfos and 27 DoubleLinkedListNodes. I can think of almost no reason I would ever implement this in practice, but it's implemented only using double linked lists. It might make the list management easier to do doubly linked rings (connecting the head and tail of the list together) but that's harder to show in text form. :)
Linked and doubly-linked lists are a specific variety of directed graphs which can be optimized into the head/tail
, data/next/prev
structure you're familiar with. Since you're broadening its capabilities, you lose that specificity, and want to go back to the generic directed graph structure and work from there.
A directed graph is most easily described with an adjacency list:
You can implement that with as a list of lists, or an array of lists, or a jagged array, or however you like. Now, on the right, I've drawn a doubly-linked list in directed graph form. Since the next
pointers are different from prev
pointers, your adjacency list needs to keep those separate. So it will actually be a list of dual lists:
typedef struct _BPNode { // "Back-Pointing Node"
void *data;
struct _BPNode *nexts[];
struct _BPNode *prevs[];
} Node;
typedef struct _BPGraph { // "Back-Pointing Graph"
Node **allNodes;
} BPGraph;
Or something like that. Disclaimer: I didn't test this in a compiler. And just in case, here's a guide on how to read some of the declarations in there.
Alternatively, you can create two directed graphs, one running forward, and one running backward. However, that would take more memory than this "back-pointing" graph. It would also run slower (more cpu cache misses), would be less intuitive, and would be more troublesome to free memory for.
You could try to separate the data from the data structure by implementing lists of pointers to data objects:
struct data_item {
unsigned char data;
unsigned char id;
unsigned int count;
// Whatever other data you want.
};
struct list_node {
struct data_item *item;
struct list_node *next;
}
Now, as we encounter characters in the file, we insert them into a "repository" data structure. For this example I'll use a simple table, but you can use a list if you want to save space or a tree if you want to save space while maintaining fast search speeds, etc.
data_item data_table[UCHAR_MAX + 1] = {0};
...
unsigned char data = read_character_from_file();
struct data_item *di = data_table[data];
if (di == NULL)
di = new_data_item(data);
else
++di->count;
And attach them to the current list:
struct list_node *list;
if (first_item_in_list())
list = new_list(di)
else
list - add_list(list, di);
Now you can have as many such lists as you want (even a list-of-lists if you don't know the number of lists in advance).