How to find a string within another, ignoring some characters?

久未见 提交于 2021-01-29 15:00:22

问题


Background

Suppose you wish to find a partial text from a formatted phone number, and you wish to mark the finding.

For example, if you have this phone number: "+972 50-123-4567" , and you search for 2501 , you will be able to mark the text within it, of "2 50-1".

More examples of a hashmap of queries and the expected result, if the text to search in is "+972 50-123-45678", and the allowed characters are "01234567890+*#" :

    val tests = hashMapOf(
            "" to Pair(0, 0),
            "9" to Pair(1, 2),
            "97" to Pair(1, 3),
            "250" to Pair(3, 7),
            "250123" to Pair(3, 11),
            "250118" to null,
            "++" to null,
            "8" to Pair(16, 17),
            "+" to Pair(0, 1),
            "+8" to null,
            "78" to Pair(15, 17),
            "5678" to Pair(13, 17),
            "788" to null,
            "+ " to Pair(0, 1),
            "  " to Pair(0, 0),
            "+ 5" to null,
            "+ 9" to Pair(0, 2)
    )

The problem

You might think: Why not just use "indexOf" or clean the string and find the occurrence ?

But that's wrong, because I want to mark the occurrence, ignoring some characters on the way.

What I've tried

I actually have the answer after I worked on it for quite some time. Just wanted to share it, and optionally see if anyone can write a nicer/shorter code, that will produce the same behavior.

I had a solution before, which was quite shorter, but it assumed that the query contains only allowed characters.

The question

Well there is no question this time, because I've found an answer myself.

However, again, if you can think of a more elegant and/shorter solution, which is as efficient as what I wrote, please let me know.

I'm pretty sure regular expressions could be a solution here, but they tend to be unreadable sometimes, and also very inefficient compared to exact code. Still could also be nice to know how this kind of question would work for it. Maybe I could perform a small benchmark on it too.


回答1:


OK so here's my solution, including a sample to test it:

TextSearchUtil.kt

object TextSearchUtil {
    /**@return where the query was found. First integer is the start. The second is the last, excluding.
     * Special cases: Pair(0,0) if query is empty or ignored, null if not found.
     * @param text the text to search within. Only allowed characters are searched for. Rest are ignored
     * @param query what to search for. Only allowed characters are searched for. Rest are ignored
     * @param allowedCharactersSet the only characters we should be allowed to check. Rest are ignored*/
    fun findOccurrenceWhileIgnoringCharacters(text: String, query: String, allowedCharactersSet: HashSet<Char>): Pair<Int, Int>? {
        //get index of first char to search for
        var searchIndexStart = -1
        for ((index, c) in query.withIndex())
            if (allowedCharactersSet.contains(c)) {
                searchIndexStart = index
                break
            }
        if (searchIndexStart == -1) {
            //query contains only ignored characters, so it's like an empty one
            return Pair(0, 0)
        }
        //got index of first character to search for
        if (text.isEmpty())
        //need to search for a character, but the text is empty, so not found
            return null
        var mainIndex = 0
        while (mainIndex < text.length) {
            var searchIndex = searchIndexStart
            var isFirstCharToSearchFor = true
            var secondaryIndex = mainIndex
            var charToSearch = query[searchIndex]
            secondaryLoop@ while (secondaryIndex < text.length) {
                //skip ignored characters on query
                if (!isFirstCharToSearchFor)
                    while (!allowedCharactersSet.contains(charToSearch)) {
                        ++searchIndex
                        if (searchIndex >= query.length) {
                            //reached end of search while all characters were fine, so found the match
                            return Pair(mainIndex, secondaryIndex)
                        }
                        charToSearch = query[searchIndex]
                    }
                //skip ignored characters on text
                var c: Char? = null
                while (secondaryIndex < text.length) {
                    c = text[secondaryIndex]
                    if (allowedCharactersSet.contains(c))
                        break
                    else {
                        if (isFirstCharToSearchFor)
                            break@secondaryLoop
                        ++secondaryIndex
                    }
                }
                //reached end of text
                if (secondaryIndex == text.length) {
                    if (isFirstCharToSearchFor)
                    //couldn't find the first character anywhere, so failed to find the query
                        return null
                    break@secondaryLoop
                }
                //time to compare
                if (c != charToSearch)
                    break@secondaryLoop
                ++searchIndex
                isFirstCharToSearchFor = false
                if (searchIndex >= query.length) {
                    //reached end of search while all characters were fine, so found the match
                    return Pair(mainIndex, secondaryIndex + 1)
                }
                charToSearch = query[searchIndex]
                ++secondaryIndex
            }
            ++mainIndex
        }
        return null
    }
}

Sample usage to test it :

MainActivity.kt

class MainActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        //
        val text = "+972 50-123-45678"
        val allowedCharacters = "01234567890+*#"
        val allowedPhoneCharactersSet = HashSet<Char>(allowedCharacters.length)
        for (c in allowedCharacters)
            allowedPhoneCharactersSet.add(c)
        //
        val tests = hashMapOf(
                "" to Pair(0, 0),
                "9" to Pair(1, 2),
                "97" to Pair(1, 3),
                "250" to Pair(3, 7),
                "250123" to Pair(3, 11),
                "250118" to null,
                "++" to null,
                "8" to Pair(16, 17),
                "+" to Pair(0, 1),
                "+8" to null,
                "78" to Pair(15, 17),
                "5678" to Pair(13, 17),
                "788" to null,
                "+ " to Pair(0, 1),
                "  " to Pair(0, 0),
                "+ 5" to null,
                "+ 9" to Pair(0, 2)
        )
        for (test in tests) {
            val result = TextSearchUtil.findOccurrenceWhileIgnoringCharacters(text, test.key, allowedPhoneCharactersSet)
            val isResultCorrect = result == test.value
            val foundStr = if (result == null) null else text.substring(result.first, result.second)
            when {
                !isResultCorrect -> Log.e("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result found String: \"$foundStr\"")
                foundStr == null -> Log.d("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result")
                else -> Log.d("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result found String: \"$foundStr\"")

            }
        }
        //
        Log.d("AppLog", "special cases:")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("a", "c", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("ab", "c", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("ab", "cd", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("a", "cd", allowedPhoneCharactersSet) == Pair(0, 0)}")
    }

}

If I want to highlight the result, I can use something like that:

    val pair = TextSearchUtil.findOccurrenceWhileIgnoringCharacters(text, "2501", allowedPhoneCharactersSet)
    if (pair == null)
        textView.text = text
    else {
        val wordToSpan = SpannableString(text)
        wordToSpan.setSpan(BackgroundColorSpan(0xFFFFFF00.toInt()), pair.first, pair.second, Spannable.SPAN_EXCLUSIVE_EXCLUSIVE)
        textView.setText(wordToSpan, TextView.BufferType.SPANNABLE)
    }


来源:https://stackoverflow.com/questions/57534062/how-to-find-a-string-within-another-ignoring-some-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!