Regex for a (twitter-like) hashtag that allows non-ASCII characters

前端未结

关注

 3  1294

I want a regex to match a simple hashtag like that in twitter (e.g. #someword). I want it also to recognize non standard characters (like those in Spanish, Hebrew or Chinese

相关标签:

3条回答

我寻月下人不归

2020-12-03 18:28

Eventually I found this: twitter-text.js useful link, which is basically how twitter solve this problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-03 18:29
With native JS regexes that don't support unicode, your only option is to explicitly enumerate characters that can end the tag and match everything else, for example:
```
> s = "foo #הַתִּקְוָה. bar"
"foo #הַתִּקְוָה. bar"
> s.match(/#(.+?)(?=[\s.,:,]|$)/)
["#הַתִּקְוָה", "הַתִּקְוָה"]
```
The [\s.,:,] should include spaces, punctuation and whatever else can be considered a terminating symbol.
0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-12-03 18:31
#([^#]+)[\s,;]*

Explanation: This regular expression will search for a # followed by one or more non-# characters, followed by 0 or more spaces, commas or semicolons.
```
var input = "#hasta #mañana #babהַ";
var matches = input.match(/#([^#]+)[\s,;]*/g);
```
Result:
```
["#hasta ", "#mañana ", "#babהַ"]
```
EDIT - Replaced \b for word boundary
0 讨论(0)
发布评论:

提交评论
- 加载中...