问题
I have a regex which will split my string into arrays.
Everyything works fine except that I would like to keep a part of the delimiter.
Here is my regex:
(&#?[a-zA-Z0-9]+;)[\s]
in Javascript, I am doing:
var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);
My paragraph is as followed:
Current addresses: † Biopharmaceutical Research and Development<br />
‡ Clovis Oncology<br />
§ Pisces Molecular <br />
|| School of Biological Sciences
¶ Department of Chemistry<br />
The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.
Thank you very much for your help.
EDIT:
I would like to get this as a result:
1. † Biopharmaceutical Research and Development<br />
2. ‡ Clovis Oncology<br />
3. § § Pisces Molecular <br />
|| School of Biological Sciences
4. ¶ Department of Chemistry<br />
回答1:
Try to use match
instead:
var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);
Updated: Added a required white-space \s
match.
Explanation:
&#?
Match&
and an optional#
(the question mark match previous one or zero times)[a-zA-Z0-9]
is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with\w
.The
+
sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.The
;
matches the character;
.The
\s
matches the class white-space. That includes space, tab and other white-space characters.[^&]*
Once again a range, but since^
is the first character the match is negated, so instead of matching the&
-characters it matches everything but the&
. The star matches the pattern zero or more times.g
at the end, after the last/
meansglobal
, and makes thematch
continue after the first match and get an array of all matches.
So, match &
and an optional #
, followed by any number of letters or digits (but at least one), followed by ;
, followed by a white-space, followed by zero or more characters that isn't &
.
回答2:
As I said in the comment, this solution (untested, by the way) will only work if you're just managing <br />
elements. Here:
var text = paragraph.split("<br />"); // now text contains just the text on each line
for(var i = 0; i<text.length-1; i++) { // don't want to add an line break to our last line
text[i] += " <br />"; // replace the <br /> elements on each line
}
The variable text
is now an array, where each element of the array is a line of the original paragraph. The linebreaks (<br />
) have been added back on the end of each line. You just mentioned that you want to split on the special characters, but from what I see, each line ends in a line break, so this should hopefully have the same effect. Unfortunately I don't have the time to write up a more complete answer at the moment.
回答3:
Using regex it is pretty simple:
var result = input.match(/&#?[^\W_]+;\s[^&]*/g);
Test it here.
来源:https://stackoverflow.com/questions/12317499/javascript-and-regex-split-and-keep-delimiter