ios app compile sqlite fts with icu,but it cant get the perfect answer when i input a letter like “z”

后端 未结 2 1220
离开以前
离开以前 2021-01-01 05:15

In sqlite I:

  1. Perform a create virtual MyTable (tokenize =icu ,id text,subject text,abstract text)
  2. Then successfully insert info MyT
相关标签:
2条回答
  • 2021-01-01 05:51

    You may also try FMDB's FMSimpleTokenizer. FMSimpleTokenizer uses build-in CFStringTokenizer and according to apple document "CFStringTokenizer allows you to tokenize strings into words, sentences or paragraphs in a language-neutral way. It supports languages such as Japanese and Chinese that do not delimit words by spaces"

    If you check FMSimpleTokenizer code, you will find that is done by calling CFStringTokenizerAdvanceToNextToken & CFStringTokenizerGetCurrentTokenRange.

    One interesting "fact" is how CFStringTokenizer tokenizes the Chinese words, for example "欢迎使用" will be tokenize into "欢迎" & "使用", which totally makes sense, but if you search "迎", you will be surprised to see no result at all!

    In that case you probably need to write a tokenizer like Hai Feng Kao's sqlite tokenizer.

    0 讨论(0)
  • 2021-01-01 05:56

    Try Hai Feng Kao's character tokenizer. It can search prefix, postfix and anything in between. It supports Chinese as well. I don't think you can find any other tokenizers which support arbitrarily substring search.

    BTW, it is a shameless self-promotion.

    If you want to open a database encoded by character tokenizer in Objective-C, do the following:

    #import <FMDB/FMDatabase.h>
    #import "character_tokenizer.h"
    
    FMDatabase* database = [[FMDatabase alloc] initWithPath:@"my_database.db"];
    if ([database open]) {
        // add FTS support
        const sqlite3_tokenizer_module *ptr;
        get_character_tokenizer_module(&ptr);
        registerTokenizer(database.sqliteHandle, "character", ptr);
    }
    
    0 讨论(0)
提交回复
热议问题