问题
I am trying to figure out an equivalent to C# string.IndexOf(string)
that can handle surrogate pairs in Unicode characters.
I am able to get the index when only comparing single characters, like in the code below:
public static int UnicodeIndexOf(this string input, string find)
{
return input.ToTextElements().ToList().IndexOf(find);
}
public static IEnumerable<string> ToTextElements(this string input)
{
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext())
{
yield return e.GetTextElement();
}
}
But if I try to actually use a string as the find
variable then it won't work because each text element only contains a single character to compare against.
Are there any suggestions as to how to go about writing this?
Thanks for any and all help.
EDIT:
Below is an example of why this is necessary:
CODE
Console.WriteLine("HolyCow𪘁BUBBYY𪘁YY𪘁Y".IndexOf("BUBB"));
Console.WriteLine("HolyCow@BUBBYY@YY@Y".IndexOf("BUBB"));
OUTPUT
9
8
Notice where I replace the 𪘁
character with @
the values change.
回答1:
You basically want to find index of one string array in another string array. We can adapt code from this question for that:
public static class Extensions {
public static int UnicodeIndexOf(this string input, string find, StringComparison comparison = StringComparison.CurrentCulture) {
return IndexOf(
// split input by code points
input.ToTextElements().ToArray(),
// split searched value by code points
find.ToTextElements().ToArray(),
comparison);
}
// code from another answer
private static int IndexOf(string[] haystack, string[] needle, StringComparison comparision) {
var len = needle.Length;
var limit = haystack.Length - len;
for (var i = 0; i <= limit; i++) {
var k = 0;
for (; k < len; k++) {
if (!String.Equals(needle[k], haystack[i + k], comparision)) break;
}
if (k == len) return i;
}
return -1;
}
public static IEnumerable<string> ToTextElements(this string input) {
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext()) {
yield return e.GetTextElement();
}
}
}
来源:https://stackoverflow.com/questions/50182335/what-is-a-unicode-safe-replica-of-string-indexofstring-input-that-can-handle-s