I want to find out all starting indexes of whole word in a given string. Lets say I have a string given below.
"an ancient manuscripts, another means to divide sentences into paragraphs was a line break (newline) followed by an initial at the beginning of the next paragraph. An initial is an oversize capital letter, sometimes outdented beyond the margin of text. This style can be seen, for example, in the original Old English manuscript of Beowulf. Outdenting is still used in English typography, though not commonly.[4] Modern English typography usually indicates a new paragraph by indenting the first line"); "
I would like to find out the starting index of "paragraph" only. Which should not include "paragraphs", "paragraph.".
Can anyone give an idea how to do it in java. Thanks in advance.
You can use a regexp with word boundaries character:
String text = "an ancient manuscripts, another means to divide sentences into paragraphs was a line break (newline) followed by an initial at the beginning of the next paragraph. An initial is an oversize capital letter, sometimes outdented beyond the margin of text. This style can be seen, for example, in the original Old English manuscript of Beowulf. Outdenting is still used in English typography, though not commonly.[4] Modern English typography usually indicates a new paragraph by indenting the first line";
Matcher m = Pattern.compile("\\bparagraph\\b").matcher(text);
while (m.find()) {
System.out.println("Matching at: " + m.start());
}
If you don't want "paragraph." ("paragraph" followed by a dot), you can try
Matcher m = Pattern.compile("\\bparagraph($| )").matcher(text);
which means paragraph followed by a space or a end-of-line.
If the String you are looking for can include special characters (like "("), you can use Pattern.quote()
to escape it:
String mySearchString = "paragraph";
Matcher m = Pattern.compile("\\b" + Pattern.quote(mySearchString) + "($| )").matcher(text);
来源:https://stackoverflow.com/questions/42622944/how-to-find-index-of-whole-word-in-string-in-java