Regex non-capturing group is capturing

前端未结

关注

 5  1125

I have this regex

(?:\\]*?)>

The point of this regex is to capture every closing ta

相关标签:

5条回答

别那么骄傲

2021-01-24 12:06
If I'm understanding your request correctly...
```
\<a[^*]href="(?:http://[^"]+?|[^"]+?\.pdf)"+?[^>]*?(>)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-01-24 12:23
You're conflating two distinct concepts: capturing and consuming. Regexes normally consume whatever they match; that's just how they work. Additionally, most regex flavors let you use capturing groups to pluck out specific parts of the overall match. (The overall match is often referred to as the zero'th capturing group, but that's just a figure of speech.)

It sounds like you're trying to match a whole <A> tag, but only consume the final >. That's not possible in most regex flavors, JavaScript included. But if you're using Perl or PHP, you could use \K to spoof the match start position:
```
(?i)<a\s+[^>]+?href="http://[^"]+"[^>]*\K>
```
And in .NET you could use a lookbehind (which, like a lookahead, matches without consuming):
```
(?i)"(?<=<a\s+[^>]+?href="http://[^"]+"[^>]*)>
```
Of the other flavors that support lookbehinds, most place restrictions on them that render them unusable for this task.
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2021-01-24 12:23

Your parentheses are around the tag itself and the href's contents, so that's what will be captured. If you need to capture the closing > then put the parenthesis around it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-01-24 12:26
If I'm understanding correctly that you want to match just the greater-than sign (>) that's part of the closing anchor tag, this should do it:
```
\<a[^*]href="(http://[^"]+?|[^"]+?\.pdf)"+?[^>]*?(>)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2021-01-24 12:31

Rewrite your regex as :

(?:\<a[^*]href="(?:http://[^"]+?|[^"]+?\.pdf)"+?[^>]*?)(>)
   non capture __^^                                    ^ ^
                                             capture __|_|

As Tony Lukasavage said, there is an unnecessary non-capture group, and, moreover, there is no need to escape <, so it becomes:

  <a[^*]href="(?:http://[^"]+?|[^"]+?\.pdf)"+?[^>]*?(>)
non capture __^^                                    ^ ^
                                          capture __|_|

0 讨论(0)