Best way to parse HTML in Qt?

后端 未结 2 1315
[愿得一人]
[愿得一人] 2021-02-05 10:16

How would I go about parsing all of the \"a\" html tags \"href\" properties on a page full of BAD html, in Qt?

相关标签:
2条回答
  • 2021-02-05 10:27

    I would use the builtin QtWebKit. Don't know how it does in terms of performance, but I think it should catch all "bad" HTML. Something like:

    class MyPageLoader : public QObject
    {
      Q_OBJECT
    
    public:
      MyPageLoader();
      void loadPage(const QUrl&);
    
    public slots:
      void replyFinished(bool);
    
    private:
      QWebView* m_view;
    };
    
    MyPageLoader::MyPageLoader()
    {
      m_view = new QWebView();
    
      connect(m_view, SIGNAL(loadFinished(bool)),
              this, SLOT(replyFinished(bool)));
    }
    
    void MyPageLoader::loadPage(const QUrl& url)
    {
      m_view->load(url);
    }
    
    void MyPageLoader::replyFinished(bool ok)
    {
      QWebElementCollection elements = m_view->page()->mainFrame()->findAllElements("a");
    
      foreach (QWebElement e, elements) {
        // Process element e
      }
    }
    

    To use the class

    MyPageLoader loader;
    loader.loadPage("http://www.example.com")
    

    and then do whatever you like with the collection.

    0 讨论(0)
  • 2021-02-05 10:32


    this question is already quite old. Nevertheless I hope this will help someone:

    I wrote two small classes for Qt which I published under sourceforge. This will help you to access a html-file comparable you are used with XML.

    Here you'll find the project:
    http://sourceforge.net/projects/sgml-for-qt/
    Here you'll find a help-system in the wiki.

    Drewle

    0 讨论(0)
提交回复
热议问题