How to split word from Thai sentence? English we can split word by space.
Example: I go to school
, split = [\'I\', \'go\', \'to\' ,\'school\']
Here's how to split Thai text into words using Kotlin and ICU4J. ICU4J is a better choice than Lucene's version (last updated 6/2011), because ICU4J is constantly updated and has additional related tools. Search for icu4j
at mvnrepository.com to see them all.
fun splitIntoWords(s: String): List {
val wordBreaker = BreakIterator.getWordInstance(Locale("th"));
wordBreaker.setText(s)
var startPos = wordBreaker.first()
var endPos = wordBreaker.next()
val words = mutableListOf()
while(endPos != BreakIterator.DONE) {
words.add(s.substring(startPos,endPos))
startPos = endPos
endPos = wordBreaker.next()
}
return words.toMutableList()
}