问题
I have some C++ code (written by someone else) which appears to be calling the wrong function. Here's the situation:
UTF8InputStreamFromBuffer* cstream = foo();
wstring fn = L"foo";
DocumentReader* reader;
if (a_condition_true_for_some_files_false_for_others) {
reader = (DocumentReader*) _new GoodDocumentReader();
} else {
reader = (DocumentReader*) _new BadDocumentReader();
}
// the crash happens inside the following call
// when a BadDocumentReader is used
doc = reader->readDocument(*cstream, fn);
The files for which the condition is true are processed fine; the ones for which it is false crash. The class hierarchy for DocumentReader looks like this:
class GenericDocumentReader {
virtual Document* readDocument(InputStream &strm, const wchar_t * filename) = 0;
}
class DocumentReader : public GenericDocumentReader {
virtual Document* readDocument(InputStream &strm, const wchar_t * filename) {
// some stuff
}
};
class GoodDocumentReader : public DocumentReader {
Document* readDocument(InputStream & strm, const wchar_t * filename);
}
class BadDocumentReader : public DocumentReader {
virtual Document* readDocument(InputStream &stream, const wchar_t * filename);
virtual Document* readDocument(const LocatedString *source, const wchar_t * filename);
virtual Document* readDocument(const LocatedString *source, const wchar_t * filename, Symbol inputType);
}
The following are also relevant:
class UTF8InputStreamFromBuffer : public wistringstream {
// foo
};
typedef std::basic_istream<wchar_t> InputStream;
Running in a Visual C++ debugger, it shows that the readDocument call on a BadDocumentReader is calling not
readDocument(InputStream&, const wchar_t*)
but rather
readDocument(const LocatedString* source, const wchar_t *, Symbol)
This is confirmed by sticking cout statements in all the readDocuments. After the call the source argument is of course full of garbage, which shortly causes a crash. LocatedString does have a one-argument implicit constructor from InputStream, but checking with a cout shows it's not getting called. Any idea what could explain this?
Edit: other possibly relevant details: The DocumentReader classes are in a different library than the calling code. I also have done a complete rebuild of all the code and the problem remained.
Edit 2: I'm using Visual C++ 2008.
Edit 3: I tried making a "minimally compilable example" with the same behavior, but was unable to replicate the problem.
Edit 4:
At Billy ONeal's suggestion, I tried changing the order of the readDocument methods in the BadDocumentReader header. Sure enough, when I change the order, it changes which of the functions gets called. This seems to me to confirm my suspicion there's something weird going on with indexing into the vtable, but I'm not sure what's causing it.
Edit 5: Here's the disassembly for the few lines before the function call:
00559728 mov edx,dword ptr [reader]
0055972E mov eax,dword ptr [edx]
00559730 mov ecx,dword ptr [reader]
00559736 mov edx,dword ptr [eax]
00559738 call edx
I don't know much assembly, but it looks to me like it's dereferencing the reader variable pointer. The first thing stored in this part of memory should be the pointer to the vtable, so it dereferences that into eax. Then it places the first thing in the vtable in edx and calls it. Recompiling with different orders of the methods doesn't seem to change this. It always wants to call the first thing in the vtable. (I could have completely misunderstood this, having no knowledge of assembly at all...)
Thanks for your help.
Edit 6: I found the problem, and I apologize for wasting everyone's time. The problem was that GoodDocumentReader was supposed to be declared as a subclass of DocumentReader, but in fact was not. The C-style casts were suppressing the compiler error (should have listened to you, @sellibitze, If you'd like to submit your comment as an answer, I'll mark it as correct). The tricky thing is that the code had been working for several months by pure accident until a revision when someone added two more virtual functions to GoodDocumentReader so it was no longer calling the right function by luck.
回答1:
I would try removing the C-cast first.
- It is completely unnecessary, casts from a Derived to a Base is natural in the language
- It may, actually, cause a bug (though it's not supposed to)
It looks like a compiler bug... it would certainly not be the first either in VS.
I unfortunately don't have VS 2008 at hand, in gcc the casts occur correctly:
struct Base1
{
virtual void foo() {}
};
struct Base2
{
virtual void bar() {}
};
struct Derived: Base1, Base2
{
};
int main(int argc, char* argv[])
{
Derived d;
Base1* b1 = (Base1*) &d;
Base2* b2 = (Base2*) &d;
std::cout << "Derived: " << &d << ", Base1: " << b1
<< ", Base2: " << b2 << "\n";
return 0;
}
> Derived: 0x7ffff1377e00, Base1: 0x7ffff1377e00, Base2: 0x7ffff1377e08
回答2:
This is happening because different source files disagree on the vtable layout of the class. The code calling the function thinks that readDocument(InputStream &, const wchar_t *)
is at a particular offset, whilst the actual vtable has it at a different offset.
This usually occurs when you change the vtable, such as by adding or removing a virtual method in that class or any of its parent classes, and then you recompile one source file but not another source file. Then, you get incompatible object files, and when you link them, things go boom.
To fix this, do a full clean and rebuild of all of your code: both the library code and your code that uses the library. If you don't have the source code to the library but you do have the header files for it with the class definitions, then that is not an option. In that case, you cannot modify the class definition -- you should revert it to how it was given to you and recompile all of your code.
回答3:
Based on the assembly, it seems pretty clear that the binding is dynamic and from the first entry of the vtable. The question is which virtual table!?! I would suggest you use a static_cast
instead of a C-style cast (of course, @VJo: dynamic_cast
is not needed in this case!). There is nothing in the standard that requires a pointer BadDocumentReader* ptr
to have the same actual value (address) as its cast static_cast<DocumentReader*>(ptr)
. This would explain why it binds the call to the first entry of the vtable of BadDocumentReader
and not to the vtable of its base class. And, btw, you shouldn't need a cast at all in this case.
One possibility that doesn't really agree with the asm, but still good to know.
Because you create the BadDocumentReader
in the same scope as you are calling the reader->readDocument
, the compiler gets a little too clever and decides that it can resolve the call without having to look it up in the vtable dynamically. This is because it knows that the "real" type of the reader pointer is actually BadDocumentReader
. So it bipasses the vtable and binds the call statically. At least, that is one possibility which I had happen to me in an almost identical situation. Based on the asm, however, I'm pretty sure the first possibility is the one occurring in your case.
回答4:
I had this issue and the problem for me was that I was storing it in a class member variable. When I changed it to a pointer and involved new/delete, it successfully registered the child class and its function.
来源:https://stackoverflow.com/questions/4798966/c-calling-completely-wrong-virtual-method-of-an-object