php: Get plain text from html - simplehtmldom or php strip_tags?

后端 未结 5 1515
有刺的猬
有刺的猬 2021-01-15 11:26

I am looking at getting the plain text from html. Which one should I choose, php strip_tags or simplehtmldom plaintext extraction?

One pro for simplehtmldom is suppo

相关标签:
5条回答
  • 2021-01-15 11:45

    You may also want to remove slashes stripslashes()

    0 讨论(0)
  • 2021-01-15 11:48

    strip_tags is sufficient for that.

    0 讨论(0)
  • 2021-01-15 11:49

    You should probably use smiplehtmldom for the reason you mentioned and that strip_tags may also leave you non-text elements like javascript or css contained within script/style blocks

    You would also be able to filter text from elements that aren't displayed (inline style=display:none)

    That said, if the html is simple enough, then strip_tags may be faster and will accomplish the same task

    0 讨论(0)
  • 2021-01-15 11:53

    If you just want a plain text rendering of a page then strip_tags is faster and simpler. If you want to do any manipulation of the text during that process, however, simplehtmldom is going to serve you better in the long run.

    0 讨论(0)
  • 2021-01-15 11:55

    Extracting text from HTML is tricky, so the best option is to use a library like Html2Text. It was built specifically for this purpose.

    https://github.com/mtibben/html2text

    Install using composer:

    composer require html2text/html2text
    

    Basic usage:

    $html = new \Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');
    
    echo $html->getText();  // Hello, "WORLD"
    
    0 讨论(0)
提交回复
热议问题