This question has been asked before:
Postgresql full text search in postgresql - japanese, chinese, arabic
but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese.
Edit: As we are already successfully using PG's internal FTS engine for English documents, we don't want to move to an external indexing engine. Basically, what I'm looking for is a Chinese FTS configuration, including parser and dictionaries for Simplified Chinese (Mandarin).
I know it's an old question but there's a Postgres extension for Chinese: https://github.com/amutu/zhparser/
I've just implemented a Chinese FTS solution in PostgreSQL. I did it by creating NGRAM tokens from Chinese input, and creating the necessary tsvector
s using an embedded function (in my case I used plpythonu
). It works very well (massively preferable to moving to SQL Server!!!).
Index your data with Solr, it's an open source enterprise search server built on top of Lucene.
You can find more info on Solr here:
http://lucene.apache.org/solr/
A good book on how-to (with PDF download immediately) here:
https://www.packtpub.com/solr-1-4-enterprise-search-server/book
And be sure to use a Chinese tokenizer, such as solr.ChineseTokenizerFactory because Chinese is not whitespace delimited.
来源:https://stackoverflow.com/questions/3994504/how-do-i-implement-full-text-search-in-chinese-on-postgresql