问题
I have a RoundCube plugin that writes the message body to the database and after that I need to parse the data into another table. By using certain functions in RoundCube I am able to remove all html tags and a </td>
is replaced by '\n' and </tr>
is replaced by '\n\n'. This make the parsing of my data very easy and robust. There is only one drawback, the html data are broken into fix lines with an =
at the end, e.g.:
<td valign=3D"bottom" style=3D"color:#444444;padding:5px 10px 5=
px 0px;font-size:12px;border-bottom:1px solid #eeeeee;"><b>Discount</b></td=
><td valign=3D"bottom" align=3D"right" style=3D"color:#444444;padding:5px 0=
px 5px 0px;font-size:12px;border-bottom:1px solid #eeeeee;text-align:right;=
"><b>Price after discount</b></td>
Now, the </td=
's are not getting recognised and therefore the Discount are joined to Price after discount in the following way DiscountPrice after discount\n, instead of Discount\n Price after discount\n. This is all the way through the code and are really causing me severe problems.
I tried to remove the = and break with things like:
$msg_body = str_replace('=', '', $msg_body);
$msg_body = str_replace('=\n', '', $msg_body);
$msg_body = str_replace('= ', '', $msg_body);
with no real success. I do not know which type of break comes after the = sign, whether it is a line break or paragraph break and tried to find out, but in vain, even looked at the RoundCube code. Echoing out the html did not revealed anything to me as well.
I post this here as a general php and html question in the hope that someone can help me to simply remove these = sign and the mysterious (to me) breaks so that
</td=
>
becomes
</td>
, etc.
回答1:
The =XY
notation is part of the (oldschool but still used!) quoted-printable encoding that represents a 8-bit ASCII string in 7-bit ASC codeset. All characters that are >127 are encoded in the form =F3
, which is a hexadecimal representation of the character.
For example in your HTML tags, the =
is encoded as =3D
if you take a closer look at it.
Read more at Wikipedia on quoted-printable
To decode the message back to normal HTML, you must apply quoted_printable_decode() to the string.
$msg_body = quoted_printable_decode($msg_body);
回答2:
For having escaped characters properly included, you have to use the double quote marks ("
) in PHP:
$msg_body = str_replace("=\n", '', $msg_body);
Otherwise, PHP will look for the string =\n
.
回答3:
depending on the system you're using the new line break can be:
\n
\r
\r\n
So check for those ones too
You can also use regexp, if you know that there is only selected number of markup that have the issue:
$msg_body = preg_replace('/(\w+)=[\s\r\n]*/', '$1', $msg_body);
In your case, it should transform the </td= ...>
into <td>
来源:https://stackoverflow.com/questions/9860243/remove-n-from-html