How to split a Thai sentence, which does not use spaces, into words?

后端 未结 4 1135
不思量自难忘°
不思量自难忘° 2021-02-19 06:46

How to split word from Thai sentence? English we can split word by space.

Example: I go to school, split = [\'I\', \'go\', \'to\' ,\'school\']

4条回答
  •  無奈伤痛
    2021-02-19 07:14

    Here's how to split Thai text into words using Kotlin and ICU4J. ICU4J is a better choice than Lucene's version (last updated 6/2011), because ICU4J is constantly updated and has additional related tools. Search for icu4j at mvnrepository.com to see them all.

     fun splitIntoWords(s: String): List {
        val wordBreaker = BreakIterator.getWordInstance(Locale("th"));
        wordBreaker.setText(s)
        var startPos = wordBreaker.first()
        var endPos = wordBreaker.next()
    
        val words = mutableListOf()
    
        while(endPos != BreakIterator.DONE) {
            words.add(s.substring(startPos,endPos))
            startPos = endPos
            endPos = wordBreaker.next()
        }
    
        return words.toMutableList()
    }
    

提交回复
热议问题