问题
I have the current regular expression:
/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)/g
Which I'm testing against the string:
Here's a #hashtag and here is #not_a_tag; which should be different. Also testing: Mid#hash. #123 #!@£ and <p>#hash</p>
For my purposes there should only be two hashtags detected in this string. I'm wondering how to alter the expression such that it doesn't match hashtags that end with a ;
in my example this is #not_a_tag;
Cheers.
回答1:
How about the following:
\B(\#[a-zA-Z]+\b)(?!;)
Regex Demo
- \B -> Not a word boundary
- (#[a-zA-Z]+\b) -> Capturing Group beginning with # followed by any number of a-z or A-Z with a word boundary at the end
- (?!;) -> Not followed by ;
回答2:
You can use a negative lookahead reegex:
/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)\b(?!;)/
\b
- word boundary ensures that we are at end of word(?!;)
- asserts that we don't have semi-colon at next position
RegEx Demo
回答3:
Similar to anubhava's answer but swap the 2 instances of \w*
with \d*
as the only difference between \w
and [A-Za-z_]
is the 0-9
characters
This has the effect of reducing the number of steps from 588 to 90
(?<=[\s>])#(\d*[A-Za-z_]+\d*)\b(?!;)
Regex101 demo
回答4:
/(#(?:[^\x00-\x7F]|\w)+)/g
Starts with #, then at least one (+) ANCII symbols ([^\x00-\x7F], range excluding non-ANCII symbols) or word symbol (\w).
This one should cover cases including ANCII symbols like "#їжак".
回答5:
This is the best practice.
(#+[a-zA-Z0-9(_)]{1,})
来源:https://stackoverflow.com/questions/38506598/regular-expression-to-match-hashtag-but-not-hashtag-with-semicolon