Regex to match all HTML tags except
and

后端未结

关注

 13  682

I need to match and remove all tags using a regular expression in Perl. I have the following:

<\\\\??(?!p).+?>

But this still matche

相关标签:

13条回答

囚心锁ツ

2020-11-30 07:00
Not sure why you are wanting to do this - regex for HTML sanitisation isn't always the best method (you need to remember to sanitise attributes and such, remove javascript: hrefs and the likes)... but, a regex to match HTML tags that aren't <p></p>:

(<[^pP].*?>|</[^pP]>)

Verbose:
```
(
    <               # < opening tag
        [^pP].*?    # p non-p character, then non-greedy anything
    >               # > closing tag
|                   #   ....or....
    </              # </
        [^pP]       # a non-p tag
    >               # >
)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-11-30 07:00

Since HTML is not a regular language I would not expect a regular expression to do a very good job at matching it. They might be up to this task (though I'm not convinced), but I would consider looking elsewhere; I'm sure perl must have some off-the-shelf libraries for manipulating HTML.

Anyway, I would think that what you want to match is </?(p.+|.*)(\s*.*)> non-greedily (I don't know the vagaries of perl's regexp syntax so I cannot help further). I am assuming that \s means whitespace. Perhaps it doesn't. Either way, you want something that'll match attributes offset from the tag name by whitespace. But it's more difficult than that as people often put unescaped angle brackets inside scripts and comments and perhaps even quoted attribute values, which you don't want to match against.

So as I say, I don't really think regexps are the right tool for the job.

0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-11-30 07:00
Assuming that this will work in PERL as it does in languages that claim to use PERL-compatible syntax:

/<\/?[^p][^>]*>/

EDIT:

But that won't match a <pre> or <param> tag, unfortunately.

This, perhaps?
```
/<\/?(?!p>|p )[^>]+>/
```
That should cover <p> tags that have attributes, too.
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2020-11-30 07:07

Since HTML is not a regular language

HTML isn't but HTML tags are and they can be adequatly described by regular expressions.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-11-30 07:07
Try this, it should work:
```
/<\/?([^p](\s.+?)?|..+?)>/
```
Explanation: it matches either a single letter except “p”, followed by an optional whitespace and more characters, or multiple letters (at least two).

/EDIT: I've added the ability to handle attributes in p tags.
0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-11-30 07:09
I used Xetius regex and it works fine. Except for some flex generated tags which can be :
with no spaces inside. I tried ti fix it with a simple ? after \s and it looks like it's working :
```
<(?!\/?p(?=>|\s?.*>))\/?.*?>
```
I'm using it to clear tags from flex generated html text so i also added more excepted tags :
```
<(?!\/?(p|a|b|i|u|br)(?=>|\s?.*>))\/?.*?>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页

Regex to match all HTML tags except and

Regex to match all HTML tags except
and