问题
I'm receiving multiple loadFinished
signals when I attempt to load a QWebPage
and I'm not sure what's causing the issue. There were a couple of other questions that seemed to allude to the same problem, but the solutions didn't work for me:
- QtWebPage - loadFinished() called multiple times
- Signal QWebPage::loadFinished(bool) returns twice?
In the first question, the answer was to connect signals to slots only once," but I already do that. The answer to the second question suggests that I should connect to the frame's loadFinished
signal, but I simply don't get the necessary data when that is done.
I attempt to load multiple pages:
int main(int argc, char *argv[])
{
QApplication app(argc, argv);
QList<QUrl> urls;
urls.append(QUrl("http://www.useragentstring.com/pages/Chrome/"));
urls.append(QUrl("http://www.useragentstring.com/pages/Firefox/"));
urls.append(QUrl("http://www.useragentstring.com/pages/Opera/"));
urls.append(QUrl("http://www.useragentstring.com/pages/Internet Explorer/"));
urls.append(QUrl("http://www.useragentstring.com/pages/Safari/"));
foreach(QUrl url, urls)
{
UA* ua = new UA();
QWebPage* page = new QWebPage();
//QObject::connect(page, SIGNAL(loadFinished(bool)), ua, SLOT(pageLoadFinished(bool)));
QObject::connect(page->mainFrame(), SIGNAL(loadFinished(bool)), ua, SLOT(frameLoadFinished(bool)));
// Load the page
page->mainFrame()->load(url);
}
return app.exec();
}
The class that processes the signals looks like this:
class UA:public QObject
{
Q_OBJECT
private:
int _numPageLoadSignals;
int _numFrameLoadSignals
public:
UA()
{
_numPageLoadSignals = 0;
_numFrameLoadSignals = 0;
}
~UA(){}
public slots:
void pageLoadFinished(bool ok)
{
_numPageLoadSignals++;
QWebPage * page = qobject_cast<QWebPage *>(sender());
if(ok && page)
{
qDebug() << _numPageLoadSignals << " loads "
<< page->mainFrame()->documentElement().findAll("div#liste ul li a").count()
<< " elements found on: " << page->mainFrame()->requestedUrl().toString();
}
}
void frameLoadFinished(bool ok)
{
_numFrameLoadSignals++;
QWebFrame * frame = qobject_cast<QWebFrame *>(sender());
if(ok && frame)
{
qDebug() << _numFrameLoadSignals << " loads "
<< frame->documentElement().findAll("div#liste ul li a").count()
<< " elements found on: " << frame->requestedUrl().toString();
}
}
};
Here is the result of only connecting to the frame's loadFinished
signal:
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Safari/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Chrome/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Opera/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Firefox/"
1 loads 241 elements found on: "http://www.useragentstring.com/pages/Internet Explorer/"
Here are the results when I connect to the page's loadFinished
signal:
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Safari/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Chrome/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Firefox/"
1 loads 0 elements found on: "http://www.useragentstring.com/pages/Internet Explorer/"
2 loads 576 elements found on: "http://www.useragentstring.com/pages/Safari/"
2 loads 782 elements found on: "http://www.useragentstring.com/pages/Chrome/"
2 loads 241 elements found on: "http://www.useragentstring.com/pages/Internet Explorer/"
2 loads 1946 elements found on: "http://www.useragentstring.com/pages/Firefox/"
3 loads 241 elements found on: "http://www.useragentstring.com/pages/Internet Explorer/"
3 loads 1946 elements found on: "http://www.useragentstring.com/pages/Firefox/"
3 loads 782 elements found on: "http://www.useragentstring.com/pages/Chrome/"
1 loads 964 elements found on: "http://www.useragentstring.com/pages/Opera/"
3 loads 576 elements found on: "http://www.useragentstring.com/pages/Safari/"
I don't understand the behavior, why sometimes I get relevant content and other times I don't. If I connect to the page's loadFinished
signal, then I will eventually get the content but I don't know when it will actually happen. How do I know when my page has actually finished loading?
Update
I'm assuming that most of my content will arrive in less than 3 seconds, so I've come up with a workaround: I set a timer event to signal the UA::loadFinished
3 seconds after the first loadFinished
signal is received from the QWebPage
. That's not very pretty, nor is it efficient, but it works for this situation.
回答1:
Quoting QWebPage documentation:
Finally, the loadFinished() signal is emitted when the page contents are loaded completely, independent of script execution or page rendering.
The catch is that last phrase. So some people in the following thread point towards the problem I believe.
Why is QWebView.loadFinished called several times on some sites e.g. youtube?
I have been struggling to code a crawler which involves pages that load content using javascript behind the scenes. Multiple loadFinished is a problem (I wish it triggered after everything is settled down.), but I noticed that the essential problem is that the webpage content may still not be rendered/prepared even after the last loadFinished activates a slot.
So I experimented with many signals of the QWebPage class to see if any of them is consistently triggered after loadFinished signal.
Found one: repaintRequested(QRect)
I don't know if this works all the time. But if any content affects the look of a web page, I believe this signal has to be called for the page to be assumed complete. I am neither displaying the pages, nor using a view widget, but the signal is consistently triggered. Only problem is it is triggered many times. (Much more often than loadFinished), therefore you need to check if the mainFrame->requestedUrl() is the same as mainFrame->url(), AND a keyword of the content you are interested in exists. (Especially if you are reusing the webPage like me. A subsequent request changes the requestedUrl, while the mainFrame content from a previous load is still there. Some persistence there)
A trick to cut the number of signals to check might be to connect repaintRequested only after receiving a loadFinished signal from the QWebPage(and possibly checking for extra conditions).
This may not address the infinite nested loads, since one does not know if any signal is the last, but if you are searching for a content then a signal is bound to be triggered after that specific content is loaded(I mean integrated into the DOM :)
回答2:
I solved this problem specifing the capacities for memory cache for dead objects, in other words i just disable QtWebKit memory cache using:
QWebSettings::setObjectCacheCapacities(0, 0, 0);
To learn more, here the link
http://qt-project.org/doc/qt-4.8/qwebsettings.html#setObjectCacheCapacities
来源:https://stackoverflow.com/questions/14780261/receiving-multiple-loadfinished-signals-for-a-requested-web-page