Remove `=\n` from html

喜夏-厌秋 提交于 2019-12-25 12:42:11

问题


I have a RoundCube plugin that writes the message body to the database and after that I need to parse the data into another table. By using certain functions in RoundCube I am able to remove all html tags and a </td> is replaced by '\n' and </tr> is replaced by '\n\n'. This make the parsing of my data very easy and robust. There is only one drawback, the html data are broken into fix lines with an = at the end, e.g.:

<td valign=3D"bottom" style=3D"color:#444444;padding:5px 10px 5=
px 0px;font-size:12px;border-bottom:1px solid #eeeeee;"><b>Discount</b></td=
><td valign=3D"bottom" align=3D"right" style=3D"color:#444444;padding:5px 0=
px 5px 0px;font-size:12px;border-bottom:1px solid #eeeeee;text-align:right;=
"><b>Price after discount</b></td>

Now, the </td='s are not getting recognised and therefore the Discount are joined to Price after discount in the following way DiscountPrice after discount\n, instead of Discount\n Price after discount\n. This is all the way through the code and are really causing me severe problems.

I tried to remove the = and break with things like:

$msg_body = str_replace('=', '', $msg_body);
$msg_body = str_replace('=\n', '', $msg_body);
$msg_body = str_replace('= ', '', $msg_body);

with no real success. I do not know which type of break comes after the = sign, whether it is a line break or paragraph break and tried to find out, but in vain, even looked at the RoundCube code. Echoing out the html did not revealed anything to me as well.

I post this here as a general php and html question in the hope that someone can help me to simply remove these = sign and the mysterious (to me) breaks so that

</td=
>

becomes

</td>

, etc.


回答1:


The =XY notation is part of the (oldschool but still used!) quoted-printable encoding that represents a 8-bit ASCII string in 7-bit ASC codeset. All characters that are >127 are encoded in the form =F3, which is a hexadecimal representation of the character.

For example in your HTML tags, the = is encoded as =3D if you take a closer look at it.

Read more at Wikipedia on quoted-printable

To decode the message back to normal HTML, you must apply quoted_printable_decode() to the string.

$msg_body = quoted_printable_decode($msg_body);



回答2:


For having escaped characters properly included, you have to use the double quote marks (") in PHP:

$msg_body = str_replace("=\n", '', $msg_body);

Otherwise, PHP will look for the string =\n.




回答3:


depending on the system you're using the new line break can be:

\n
\r
\r\n

So check for those ones too

You can also use regexp, if you know that there is only selected number of markup that have the issue:

$msg_body = preg_replace('/(\w+)=[\s\r\n]*/', '$1', $msg_body);

In your case, it should transform the </td= ...> into <td>



来源:https://stackoverflow.com/questions/9860243/remove-n-from-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!