问题
I'm looking for a Regular Expression for JavaScript that will identify word boundaries in English, while accepting hyphens and apostrophes that appear inside words, but excluding those that appear alone or at the beginning or end of a word.
For example, for the sentence ...
She said - 'That'll be all, Two-Fry.'
... I want the characters shown in grey below to be detected:
Shesaid
- '
That'llbe
all
,
Two-Fry.'
If I use the regex /[^A-Za-z'-]/g
, then "loose" hyphens and apostrophes are not detected.
Shesaid
-
'That'll
be
all
,
Two-Fry.
'
How can I alter my regex so that it detects apostrophes and hyphens that don't have a word character on both sides?
You can test my regex here: https://regex101.com/r/bR8sV1/2
Note: the text I will be working on may contain other writing scripts, like руский and ไทอ so it will not be feasible to simply include all the characters that are not part of any English word.
回答1:
You can organize your word-boundary characters into two groups.
- Characters that cannot be alone.
- Characters that can be alone.
A regex that works with your example would be:
[\s.,'-]{2,}|[\s.]
Regex101 Demo
Now all that's left is to keep adding all non-word characters into those two groups until it fits all of your needs. So you might start adding symbols and more punctuation to those character classes.
回答2:
You could write something like that:
(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*\s(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*
Or the compact version:
(\s|[!-/:-@\[-`\{-~])*\s(\s|[!-/:-@\[-`\{-~])*
The RegExp requires one \s
(Space character) and selects als spaces and non alphanumeric chars before and after it.
https://regex101.com/r/bR8sV1/4
\s
matches all spaces!-/
every char from!
to/
:-@
every char from:
to@
\[-``
every char from[
to``
\{-~
every char from{
to~
来源:https://stackoverflow.com/questions/38935627/javascript-regular-expression-for-word-boundaries-tolerating-in-word-hyphens-an