问题
I have been working on a templated implementation of a linked-list, on purpose, to reinvent the wheel, to stumble into just this type problem to help learn the subtle nuances of pointer to class instance handling. The problem I have stumbled into has to do with merging sublists where on the second merge (the first merge where sublists can have multiple nodes) fails where a prior class instance (either from split
or mergesorted
) appears to go out of scope (which should not have any affect on the merge as the pointer assignment is the a prior list that always remains in scope until after the assignment of the original list node has taken place)
The key here is that all class instances have pointers to the original nodes from the original list, so as long as the sublist instance remains in scope until the beginning node of the sublist is returned and assigned to the list in the previous recursion. I am trying to move a perfectly-good 100% working C implementation. So it is a problem with my understanding of why I cannot treat class instances as I would a struct in C that is the issue here -- but I cannot put my finger on documentation that explains why.
The class list_t
contains the struct node_t
to form the list.
/* linked list node */
template <class T>
struct node_t {
T data;
node_t<T> *next;
};
template <class T>
class list_t {
node_t<T> *head, *tail;
int (*cmp)(const node_t<T>*, const node_t<T>*);
public:
list_t (void); /* constructors */
list_t (int(*f)(const node_t<T>*, const node_t<T>*));
~list_t (void); /* destructor */
list_t (const list_t&); /* copy constructor */
/* setter for compare function */
,,,
list_t split (void); /* split list ~ 1/2 */
...
/* merge lists after mergesort_start */
node_t<T> *mergesorted (node_t<T> *a, node_t<T> *b);
void mergesort_run (list_t<T> *l); /* mergesort function */
void mergesort (void); /* wrapper for mergesort */
};
(yes I know no _t
suffix, that's not the point here)
The split
function is working fine and is:
/* split list l into lists a & b */
template <class T>
list_t<T> list_t<T>::split (void)
{
list_t<T> s; /* new instance of class */
node_t<T> *pa = head, /* pointer to current head */
*pb = pa->next; /* 2nd pointer to double-advance */
while (pb) { /* while not end of list */
pb = pb->next; /* advance 2nd ptr */
if (pb) { /* if not nullptr */
pa = pa->next; /* advance current ptr */
pb = pb->next; /* advance 2nd ptr again */
}
}
s.tail = tail; /* 2nd half tail will be current tail */
tail = pa; /* current tail is at pa */
s.head = pa->next; /* 2nd half head is next ptr */
pa->next = nullptr; /* set next ptr NULL to end 1st 1/2 */
return s; /* return new instance */
}
For the mergesort, I have a wrapper that calls the actual mergesort function mergesort_run
. This was done so updating the tail
pointer is only called after the sort completes, e.g.
/* wrapper to the actual mergesort routing in mergesort_run */
template <class T>
void list_t<T>::mergesort(void)
{
mergesort_run (this);
/* set tail pointer to last node after sort */
for (node_t<T> *pn = head; pn; pn = pn->next)
tail = pn;
}
mergesort_run
is as follows:
/* split and merge splits in sort order */
template <class T>
void list_t<T>::mergesort_run (list_t<T> *l)
{
/* Base case -- length 0 or 1 */
if (!l->head || !l->head->next) {
return;
}
/* Split head into 'a' and 'b' sublists */
list_t<T> la = l->split();
/* Recursively sort the sublists */
mergesort_run(l);
mergesort_run(&la);
/* merge the two sorted lists together */
l->head = mergesorted (l->head, la.head);
}
The merge function, mergesorted
merges the sublist in sort order:
template <class T>
node_t<T> *list_t<T>::mergesorted (node_t<T> *a, node_t<T> *b)
{
node_t<T> *result = nullptr;
/* Base cases */
if (!a)
return (b);
else if (!b)
return (a);
/* Pick either a or b, and recur */
if (cmp (a, b) <= 0) {
result = a;
result->next = mergesorted (a->next, b);
}
else {
result = b;
result->next = mergesorted (a, b->next);
}
return result;
}
Working C Implementation I am Moving From
Each of the above (other than me splitting out the initial wrapper) is an implementation from the following working C split/mergesort:
/* split list l into lists a & b */
void split (list_t *l, list_t *a)
{
node_t *pa = l->head,
*pb = pa->next;
while (pb) {
pb = pb->next;
if (pb) {
pa = pa->next;
pb = pb->next;
}
}
a->tail = l->tail;
l->tail = pa;
a->head = pa->next;
pa->next = NULL;
}
/* merge splits in sort order */
node_t *mergesorted (node_t *a, node_t *b)
{
node_t *res = NULL;
/* base cases */
if (!a)
return (b);
else if (!b)
return (a);
/* Pick either a or b, and recurse */
if (a->data <= b->data) {
res = a;
res->next = mergesorted (a->next, b);
}
else {
res = b;
res->next = mergesorted (a, b->next);
}
return res;
}
/* sorts the linked list by changing next pointers (not data) */
void mergesort (list_t *l)
{
list_t la;
node_t *head = l->head;
/* Base case -- length 0 or 1 */
if (!head || !head->next) {
return;
}
/* Split head into 'a' and 'b' sublists */
split (l, &la);
/* Recursively sort the sublists */
mergesort(l);
mergesort(&la);
/* answer = merge the two sorted lists together */
l->head = mergesorted (l->head, la.head);
/* set tail pointer to last node after sort */
for (head = l->head; head; head = head->next)
l->tail = head;
}
On 2nd Merge The Nodes From The 1st Merge Vanish
I have stepped through the C++ implementation with gdb
and valgrind
. In gdb
the code will complete without error, but in valgrind
you have the invalid read of 4 and 8 bytes after a block that has been freed suggesting the destructor is freeing memory (which it should) but that the pointer assignments done as the recursion unwinds has a dependence on the address of the pointer from the nested recursive call instead of just using the values at the address from the original (as the above C code does perfectly)
What is happening is that after the list is split down to sublists with a single node and the first merge takes place -- we are still good. When the next unwind happens where you would merge the combined node with another sublist -- the values of the 2-node sublist are lost. So after picking though the C and C++ implementations, I am feeiling like an idiot, because problems I could simply debug/correct in C I am missing some critial understanding that allows me to do the same with a C++ class implementation of the same code.
Test Code
int main (void) {
list_t<int> l;
int arr[] = {12, 11, 10, 7, 4, 14, 8, 16, 20, 19,
2, 9, 1, 13, 17, 6, 15, 5, 3, 18};
unsigned asz = sizeof arr / sizeof *arr;
for (unsigned i = 0; i < asz; i++)
l.addnode (arr[i]);
l.prnlist();
#ifdef ISORT
l.insertionsort();
#else
l.mergesort();
#endif
l.prnlist();
}
The beginning merge of the left-sublist after it is split down to nodes 12
and 11
goes fine. As soon as I go to merge the 11, 12
sublist with 10
-- the 11, 12
sublist values are gone.
MCVE
#include <iostream>
/* linked list node */
template <class T>
struct node_t {
T data;
node_t<T> *next;
};
/* default compare function for types w/overload (ascending) */
template <typename T>
int compare_asc (const node_t<T> *a, const node_t<T> *b)
{
return (a->data > b->data) - (a->data < b->data);
}
/* compare function for types w/overload (descending) */
template <typename T>
int compare_desc (const node_t<T> *a, const node_t<T> *b)
{
return (a->data < b->data) - (a->data > b->data);
}
template <class T>
class list_t {
node_t<T> *head, *tail;
int (*cmp)(const node_t<T>*, const node_t<T>*);
public:
list_t (void); /* constructors */
list_t (int(*f)(const node_t<T>*, const node_t<T>*));
~list_t (void); /* destructor */
list_t (const list_t&); /* copy constructor */
/* setter for compare function */
void setcmp (int (*f)(const node_t<T>*, const node_t<T>*));
node_t<T> *addnode (T data); /* simple add at end */
node_t<T> *addinorder (T data); /* add in order */
void delnode (T data); /* delete node */
void prnlist (void); /* print space separated */
list_t split (void); /* split list ~ 1/2 */
void insertionsort (void); /* insertion sort list */
/* merge lists after mergesort_start */
node_t<T> *mergesorted (node_t<T> *a, node_t<T> *b);
void mergesort_run (list_t<T> *l); /* mergesort function */
void mergesort (void); /* wrapper for mergesort */
};
/* constructor (default) */
template <class T>
list_t<T>::list_t (void)
{
head = tail = nullptr;
cmp = compare_asc;
}
/* constructor taking compare function as argument */
template <class T>
list_t<T>::list_t (int(*f)(const node_t<T>*, const node_t<T>*))
{
head = tail = nullptr;
cmp = f;
}
/* destructor free all list memory */
template <class T>
list_t<T>::~list_t (void)
{
node_t<T> *pn = head;
while (pn) {
node_t<T> *victim = pn;
pn = pn->next;
delete victim;
}
}
/* copy ctor - copy exising list */
template <class T>
list_t<T>::list_t (const list_t& l)
{
cmp = l.cmp; /* assign compare function ptr */
head = tail = nullptr; /* initialize head/tail */
/* copy data to new list */
for (node_t<T> *pn = l.head; pn; pn = pn->next)
this->addnode (pn->data);
}
/* setter compare function */
template <class T>
void list_t<T>::setcmp (int(*f)(const node_t<T>*, const node_t<T>*))
{
cmp = f;
}
/* add using tail ptr */
template <class T>
node_t<T> *list_t<T>::addnode (T data)
{
node_t<T> *node = new node_t<T>; /* allocate/initialize node */
node->data = data;
node->next = nullptr;
if (!head)
head = tail = node;
else {
tail->next = node;
tail = node;
}
return node;
}
template <class T>
node_t<T> *list_t<T>::addinorder (T data)
{
if (!cmp) { /* validate compare function not nullptr */
std::cerr << "error: compare is nullptr.\n";
return nullptr;
}
node_t<T> *node = new node_t<T>; /* allocate/initialize node */
node->data = data;
node->next = nullptr;
node_t<T> **ppn = &head, /* ptr-to-ptr to head */
*pn = head; /* ptr to head */
while (pn && cmp (node, pn) > 0) { /* node sorts after current */
ppn = &pn->next; /* ppn to address of next */
pn = pn->next; /* advance pointer to next */
}
node->next = pn; /* set node->next to next */
if (pn == nullptr)
tail = node;
*ppn = node; /* set current to node */
return node; /* return node */
}
template <class T>
void list_t<T>::delnode (T data)
{
node_t<T> **ppn = &head; /* pointer to pointer to node */
node_t<T> *pn = head; /* pointer to node */
for (; pn; ppn = &pn->next, pn = pn->next) {
if (pn->data == data) {
*ppn = pn->next; /* set address to next */
delete pn;
break;
}
}
}
template <class T>
void list_t<T>::prnlist (void)
{
if (!head) {
std::cout << "empty-list\n";
return;
}
for (node_t<T> *pn = head; pn; pn = pn->next)
std::cout << " " << pn->data;
std::cout << '\n';
}
/* split list l into lists a & b */
template <class T>
list_t<T> list_t<T>::split (void)
{
list_t<T> s; /* new instance of class */
node_t<T> *pa = head, /* pointer to current head */
*pb = pa->next; /* 2nd pointer to double-advance */
while (pb) { /* while not end of list */
pb = pb->next; /* advance 2nd ptr */
if (pb) { /* if not nullptr */
pa = pa->next; /* advance current ptr */
pb = pb->next; /* advance 2nd ptr again */
}
}
s.tail = tail; /* 2nd half tail will be current tail */
tail = pa; /* current tail is at pa */
s.head = pa->next; /* 2nd half head is next ptr */
pa->next = nullptr; /* set next ptr NULL to end 1st 1/2 */
return s; /* return new instance */
}
/** insertion sort of linked list.
* re-orders list in sorted order.
*/
template <class T>
void list_t<T>::insertionsort (void)
{
node_t<T> *sorted = head, /* initialize sorted list to 1st node */
*pn = head->next; /* advance original list node to next */
sorted->next = NULL; /* initialize sorted->next to NULL */
while (pn) { /* iterate over existing from 2nd node */
node_t<T> **pps = &sorted, /* ptr-to-ptr to sorted list */
*ps = *pps, /* ptr to sorted list */
*next = pn->next; /* save list next as separate pointer */
while (ps && cmp(ps, pn) < 0) { /* loop until sorted */
pps = &ps->next; /* get address of next node */
ps = ps->next; /* get next node pointer */
}
*pps = pn; /* insert existing in sort order as current */
pn->next = ps; /* set next as sorted next */
pn = next; /* reinitialize existing pointer to next */
}
head = sorted; /* update head to sorted head */
/* set tail pointer to last node after sort */
for (pn = head; pn; pn = pn->next)
tail = pn;
}
/* FIXME mergesort recursion not working */
template <class T>
node_t<T> *list_t<T>::mergesorted (node_t<T> *a, node_t<T> *b)
{
node_t<T> *result = nullptr;
/* Base cases */
if (!a)
return (b);
else if (!b)
return (a);
/* Pick either a or b, and recur */
if (cmp (a, b) <= 0) {
result = a;
result->next = mergesorted (a->next, b);
}
else {
result = b;
result->next = mergesorted (a, b->next);
}
return result;
}
/* split and merge splits in sort order */
template <class T>
void list_t<T>::mergesort_run (list_t<T> *l)
{
/* Base case -- length 0 or 1 */
if (!l->head || !l->head->next) {
return;
}
/* Split head into 'a' and 'b' sublists */
list_t<T> la = l->split();
/* Recursively sort the sublists */
mergesort_run(l);
mergesort_run(&la);
/* merge the two sorted lists together */
l->head = mergesorted (l->head, la.head);
}
/* wrapper to the actual mergesort routing in mergesort_run */
template <class T>
void list_t<T>::mergesort(void)
{
mergesort_run (this);
/* set tail pointer to last node after sort */
for (node_t<T> *pn = head; pn; pn = pn->next)
tail = pn;
}
int main (void) {
list_t<int> l;
int arr[] = {12, 11, 10, 7, 4, 14, 8, 16, 20, 19,
2, 9, 1, 13, 17, 6, 15, 5, 3, 18};
unsigned asz = sizeof arr / sizeof *arr;
for (unsigned i = 0; i < asz; i++)
l.addnode (arr[i]);
l.prnlist();
#ifdef ISORT
l.insertionsort();
#else
l.mergesort();
#endif
l.prnlist();
}
Result of Insertion Sort -- Expected Results
Compile with -DISORT
to test insertion sort (working):
$ ./bin/ll_merge_post
12 11 10 7 4 14 8 16 20 19 2 9 1 13 17 6 15 5 3 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Result of Mergesort -- Not Good
$ ./bin/ll_merge_post
12 11 10 7 4 14 8 16 20 19 2 9 1 13 17 6 15 5 3 18
0 16108560 16108656 16108688 16108560 16108816 16108784 16108848 16108752 16108720 16109072 16108976 16108944 16109008 16108880 16108912 16109136 16109104 16109168 16109040
So I'm stuck. (and it is probably something simple I should see but don't) Why is the merging of the sublists failing? What is the critical piece of understanding of class instance in C++ verses C struct handling I'm missing?
回答1:
In mergesort_run
, you have a local list la
that contains half of your source list. At the end of the function you merge the content of la
back into the new list, but the variable itself still points at the nodes you merged. When the destructor for la
is run, these nodes will be deleted.
If you set the head node of la
to a NULL pointer (la.head = nullptr
) after doing the merge, then when the destructor runs there aren't any nodes for it to delete.
One unrelated issue is that you don't copy cmp
in places when creating a new list (like split
).
来源:https://stackoverflow.com/questions/57735047/c-implementation-of-mergesort-of-linked-list-fails-on-joining-sublists-of-more