php: Get plain text from html - simplehtmldom or php strip_tags?

后端未结

关注

 5  1522

I am looking at getting the plain text from html. Which one should I choose, php strip_tags or simplehtmldom plaintext extraction?

One pro for simplehtmldom is suppo

相关标签:

5条回答

谎友^

2021-01-15 11:45

You may also want to remove slashes stripslashes()

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2021-01-15 11:48

strip_tags is sufficient for that.

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-15 11:49

You should probably use smiplehtmldom for the reason you mentioned and that strip_tags may also leave you non-text elements like javascript or css contained within script/style blocks

You would also be able to filter text from elements that aren't displayed (inline style=display:none)

That said, if the html is simple enough, then strip_tags may be faster and will accomplish the same task

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-15 11:53

If you just want a plain text rendering of a page then strip_tags is faster and simpler. If you want to do any manipulation of the text during that process, however, simplehtmldom is going to serve you better in the long run.

0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-01-15 11:55
Extracting text from HTML is tricky, so the best option is to use a library like Html2Text. It was built specifically for this purpose.

https://github.com/mtibben/html2text

Install using composer:
```
composer require html2text/html2text
```
Basic usage:
```
$html = new \Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...