How do I implement full text search in Chinese on PostgreSQL?

耗尽温柔 提交于 2019-12-06 19:37:41

问题


This question has been asked before:

Postgresql full text search in postgresql - japanese, chinese, arabic

but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese.

Edit: As we are already successfully using PG's internal FTS engine for English documents, we don't want to move to an external indexing engine. Basically, what I'm looking for is a Chinese FTS configuration, including parser and dictionaries for Simplified Chinese (Mandarin).


回答1:


I know it's an old question but there's a Postgres extension for Chinese: https://github.com/amutu/zhparser/




回答2:


I've just implemented a Chinese FTS solution in PostgreSQL. I did it by creating NGRAM tokens from Chinese input, and creating the necessary tsvectors using an embedded function (in my case I used plpythonu). It works very well (massively preferable to moving to SQL Server!!!).




回答3:


Index your data with Solr, it's an open source enterprise search server built on top of Lucene.

You can find more info on Solr here:

http://lucene.apache.org/solr/

A good book on how-to (with PDF download immediately) here:

https://www.packtpub.com/solr-1-4-enterprise-search-server/book

And be sure to use a Chinese tokenizer, such as solr.ChineseTokenizerFactory because Chinese is not whitespace delimited.



来源:https://stackoverflow.com/questions/3994504/how-do-i-implement-full-text-search-in-chinese-on-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!