Why is this A0 character appearing in my HTML::Element output?

后端 未结 2 1293
予麋鹿
予麋鹿 2021-01-07 17:23

I\'m parsing an HTML document with a couple Perl modules: HTML::TreeBuilder and HTML::Element. For some reason whenever the content of a tag is just  ,

相关标签:
2条回答
  • 2021-01-07 17:35

    The character is non-breaking space which is what   stands for:

    In word processing and digital typesetting, a non-breaking space (" ") (also called no-break space, non-breakable space (NBSP), hard space, or fixed space) is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space.

    In HTML, the common non-breaking space, which is the same width as the ordinary space character, is encoded as   or  . In Unicode, it is encoded as U+00A0.

    0 讨论(0)
  • 2021-01-07 17:53

    The character is "\xa0" (i.e. 160), which is the standard Unicode translation for  . (That is, it's Unicode's non-breaking space.) You should be able to remove them with s/\xa0/ /g if you like.

    0 讨论(0)
提交回复
热议问题