Is SQLite on Android built with the ICU tokenizer enabled for FTS?

吃可爱长大的小学妹 提交于 2019-12-01 04:01:08

No, only tokenizer=porter

When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"

Also, this link hints that if Android didn't compile it in by default, it will not be available http://sqlite.phxsoftware.com/forums/t/2349.aspx

Gordon Liang

For API Level 21 or up, I tested and found that ICU tokenizer is already available.

However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text

You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).

When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.

When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.

This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.

For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2

The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese. http://developer.android.com/reference/java/text/BreakIterator.html

I have some Android code that uses tokenization in the link below, maybe it will of some help:

https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!