问题
Using the GAE search API is it possible to search for a partial match?
I'm trying to create autocomplete functionality where the term would be a partial word. eg.
> b
> bui
> build
would all return "building".
How is this possible with GAE?
回答1:
Though LIKE statement (partial match) is not supported in Full Text Search, but you could hack around it.
First, tokenize the data string for all possible substrings (hello = h, he, hel, lo, etc.)
def tokenize_autocomplete(phrase):
a = []
for word in phrase.split():
j = 1
while True:
for i in range(len(word) - j + 1):
a.append(word[i:i + j])
if j == len(word):
break
j += 1
return a
Build an index + document (Search API) using the tokenized strings
index = search.Index(name='item_autocomplete')
for item in items: # item = ndb.model
name = ','.join(tokenize_autocomplete(item.name))
document = search.Document(
doc_id=item.key.urlsafe(),
fields=[search.TextField(name='name', value=name)])
index.put(document)
Perform search, and walah!
results = search.Index(name="item_autocomplete").search("name:elo")
https://code.luasoftware.com/tutorials/google-app-engine/partial-search-on-gae-with-search-api/
回答2:
just like @Desmond Lua answer, but with different tokenize function:
def tokenize(word): token=[] words = word.split(' ') for word in words: for i in range(len(word)): if i==0: continue w = word[i] if i==1: token+=[word[0]+w] continue token+=[token[-1:][0]+w] return ",".join(token)
it will parse hello world
as he,hel,hell,hello,wo,wor,worl,world
.
it's good for light autocomplete purpose
回答3:
As described at Full Text Search and LIKE statement, no it's not possible, since the Search API implements full text indexing.
Hope this helps!
回答4:
I have same problem for typeahead control, and my solution is parse string to small part :
name='hello world'
name_search = ' '.join([name[:i] for i in xrange(2, len(name)+1)])
print name_search;
# -> he hel hell hello hello hello w hello wo hello wor hello worl hello world
Hope this help
回答5:
My version optimized: not repeat tokens
def tokenization(text):
a = []
min = 3
words = text.split()
for word in words:
if len(word) > min:
for i in range(min, len(word)):
token = word[0:i]
if token not in a:
a.append(token)
return a
回答6:
Jumping in very late here.
But here is my well documented function that does tokenizing. The docstring should help you understand it well and use it. Good luck!!!
def tokenize(string_to_tokenize, token_min_length=2):
"""Tokenizes a given string.
Note: If a word in the string to tokenize is less then
the minimum length of the token, then the word is added to the list
of tokens and skipped from further processing.
Avoids duplicate tokens by using a set to save the tokens.
Example usage:
tokens = tokenize('pack my box', 3)
Args:
string_to_tokenize: str, the string we need to tokenize.
Example: 'pack my box'.
min_length: int, the minimum length we want for a token.
Example: 3.
Returns:
set, containng the tokenized strings. Example: set(['box', 'pac', 'my',
'pack'])
"""
tokens = set()
token_min_length = token_min_length or 1
for word in string_to_tokenize.split(' '):
if len(word) <= token_min_length:
tokens.add(word)
else:
for i in range(token_min_length, len(word) + 1):
tokens.add(word[:i])
return tokens
来源:https://stackoverflow.com/questions/12899083/partial-matching-gae-search-api