Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...)
. If we can, does anyone know what locales are suported, and whether it varies by platform version?
No, only tokenizer=porter
When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"
Also, this link hints that if Android didn't compile it in by default, it will not be available http://sqlite.phxsoftware.com/forums/t/2349.aspx
For API Level 21 or up, I tested and found that ICU tokenizer is already available.
However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text
You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).
When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.
When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.
This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.
For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2
The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese. http://developer.android.com/reference/java/text/BreakIterator.html
I have some Android code that uses tokenization in the link below, maybe it will of some help:
来源:https://stackoverflow.com/questions/7070193/is-sqlite-on-android-built-with-the-icu-tokenizer-enabled-for-fts