问题
String text = "Cámélan discovered ônte red aleŕt \n Como se extingue la deuda";
If I give the input Ca, it should highlight from the given string Cá but it's not highlighting.
Below is what I tried.
Pattern mPattern;
String filterTerm; //this is the input which I give from input filter. Say for eg: Ca
String regex = createFilterRegex(filterTerm);
mPattern = Pattern.compile(regex);
private String createFilterRegex(String filterTerm) {
filterTerm = Normalizer.normalize(filterTerm, Normalizer.Form.NFD);
filterTerm = filterTerm.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return filterTerm;
}
public Pattern getPattern() {
return mPattern;
}
In another class,
private SpannableStringBuilder createHighlightedString(String nodeText, int highlightColor) { //nodeText is the entire list displaying.
SpannableStringBuilder returnValue = new SpannableStringBuilder(nodeText);
String lowercaseNodeText = nodeText;
Matcher matcher = mFilter.getPattern().matcher((createFilterRegex(lowercaseNodeText)));
while (matcher.find()) {
returnValue.setSpan(new ForegroundColorSpan(highlightColor), matcher.start(0),
matcher.end(0), Spannable.SPAN_EXCLUSIVE_INCLUSIVE);
}
return returnValue;
}
viewHolder.mTextView.setText(createHighlightedString((node.getText()), mHighlightColor));
But what I am getting the output as,
If I type single alphabet o alone, it's highlighting but if I pass more than two alphabets say for eg: Ca, it's not highlighting and displaying. I couldn't figure out what mistake I am doing.
But if you look WhatsApp. it has been achieved.
I typed Co, it's recognizing and highlighting accented characters in the sentence.
回答1:
As you said,
String text = "Cámélan discovered ônte red aleŕt \n Como se extingue la deuda";
So whenever you give first input, receive that first character and compare.
Eg: If you give Ca, then
if (StringUtils.isNotEmpty(substring)) { //this is the search text
substring=substring.substring(0,1); //now you get C alone.
}
So whatever you type it displays by filtering the first character. Now
SpannableString builder = higlightString((yourContent.getText()), mHighlightColor);
viewHolder.mTextView.setText(builder);
private SpannableString higlightString(String entireContent, int highlightColor) {
SpannableString returnValue = new SpannableString(entireContent);
String lowercaseNodeText = entireContent;
try {
Matcher matcher = mFilter.getPattern().matcher(((diacritical(lowercaseNodeText.toLowerCase()))));
while (matcher.find()) {
returnValue.setSpan(new ForegroundColorSpan(highlightColor), matcher.start(0),
matcher.end(0), Spannable.SPAN_EXCLUSIVE_INCLUSIVE);
}
}
catch (Exception e){
e.printStackTrace();
}
return returnValue;
}
private String diacritical(String original) {
String removed=null;
String decomposed = Normalizer.normalize(original, Normalizer.Form.NFD);
removed = decomposed.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
return removed;
}
Test case:
When you give input Ca, it goes to the entire text by displaying all the C content get all the datas and filter out by normalising the content and it matches with accented characters too and display by higlighting.
回答2:
You already got:
private String convertToBasicLatin(String text) {
return Normalizer.normalize(text, Normalizer.Form.NFD)
.replaceAll("\\p{M}", "").replaceAll("\\R", "\n");
}
In order to have one unaccented basic latin char match one Unicode code point of an accented letter, one should normalize the to the composed form:
private String convertToComposedCodePoints(String text) {
return Normalizer.normalize(text, Normalizer.Form.NFC).replaceAll("\\R", "\n");
}
In general one might make the assumption that the Unicode code point is 1 char long too.
- The search key uses convertToBasicLatin(sought)
- The text view's content uses convertToComposedCodePoints(content)
- The text content for matching uses convertToBasicLatin(content)
Now the matcher's index positions of start
and end
are correct.
I normalized explicitly line endings (regex \R
) like \r\n
or \u0085
to a single \n
.
One cannot normalize to lowercase/uppercase, as the number of chars might vary:
German lowercase ß
corresponds with uppercase SS
.
String sought = ...;
String content = ...;
sought = convertToBasicLatin(sought);
String latinContent = convertToBasicLatin(content);
String composedContent = convertToComposedUnicode(content);
Matcher m = Pattern.compile(sought, Pattern.CASE_INSENSITIVE
| Pattern.UNICODE_CASE | Pattern.UNICODE_CHARACTER_CLASS
| Pattern.UNIX_LINES)
.matcher(latinContent);
while (m.find()) {
... // One can apply `m.start()` and `m.end()` to composedContent of the view too.
}
回答3:
I'm not a Java programmer, so just some basic raw regex solution here.
If you can Normalize the string with it's decomposition form
assume it's this
String sSourceTargetDecom = Normalizer.normalize(sourcetarget, Normalizer.Form.NFD);
,
that should turn something like 0000C1 Á LATIN CAPITAL LETTER A WITH ACUTE
into two characters A
and 000301 ́ COMBINING ACUTE ACCENT
.
You can get most combining characters from blocks using
[\p{Block=Combining_Diacritical_Marks}\p{Block=Combining_Diacritical_Marks_Extended}\p{Block=Combining_Diacritical_Marks_For_Symbols}\p{Block=Combining_Diacritical_Marks_Supplement}\p{Block=Combining_Half_Marks}]
which has a hex range of
[\x{300}-\x{36f}\x{1ab0}-\x{1aff}\x{1dc0}-\x{1dff}\x{20d0}-\x{20ff}\x{fe20}-\x{fe2f}]
It turns out that most of the combining marks relative to basic Latin that can be
decomposed are in the [\x{300}-\x{36f}]
range.
You can decompose both the source target and the input search string.
Then create a regex from the input search string.
Inject [\x{300}-\x{36f}]?
after each basic Latin letter.
String regex = sSearch.replaceAll("([a-zA-Z])[\\x{300}-\\x{36f}]?", "\\1[\\x{300}-\\x{36f}]?");
(not sure what Java uses for codepoint character notation in their regex, possibly needs to be \u{DD}
Then use the regex on the sSourceTargetDecom string, it will match the basic latin as a stand alone, and/or with an optional combining code.
来源:https://stackoverflow.com/questions/52835775/given-mixed-accented-and-normal-characters-in-string-not-working-in-java-when-se