Best way to parse HTML in Qt?

后端 未结 2 1334
[愿得一人]
[愿得一人] 2021-02-05 10:16

How would I go about parsing all of the \"a\" html tags \"href\" properties on a page full of BAD html, in Qt?

2条回答
  •  借酒劲吻你
    2021-02-05 10:27

    I would use the builtin QtWebKit. Don't know how it does in terms of performance, but I think it should catch all "bad" HTML. Something like:

    class MyPageLoader : public QObject
    {
      Q_OBJECT
    
    public:
      MyPageLoader();
      void loadPage(const QUrl&);
    
    public slots:
      void replyFinished(bool);
    
    private:
      QWebView* m_view;
    };
    
    MyPageLoader::MyPageLoader()
    {
      m_view = new QWebView();
    
      connect(m_view, SIGNAL(loadFinished(bool)),
              this, SLOT(replyFinished(bool)));
    }
    
    void MyPageLoader::loadPage(const QUrl& url)
    {
      m_view->load(url);
    }
    
    void MyPageLoader::replyFinished(bool ok)
    {
      QWebElementCollection elements = m_view->page()->mainFrame()->findAllElements("a");
    
      foreach (QWebElement e, elements) {
        // Process element e
      }
    }
    

    To use the class

    MyPageLoader loader;
    loader.loadPage("http://www.example.com")
    

    and then do whatever you like with the collection.

提交回复
热议问题