Indexing Euro (€) and Lb (£) in Sphinx

早过忘川 提交于 2019-12-12 06:00:11

问题


These don't seem to index, even when I explicitly add them to my charset_table:

charset_table=...  U+20AC->U+20AC, U+00A3->U+00A3

I even tried mapping them to the dollar sign

U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024

Yet in each case they are unrecognized in other words MATCH('£1000') will not find 'cost is £1000' and if I try to map to $ as per the second example then MATCH('$1000)` will not either.

If I do a MySQL Search however where field like '%£%' I do get records leading me to believe the MySQL is encoding UTF-8 correctly. Meaning the Pound Sign and Euro characters are being stored correctly in MySQL but the Sphinx index is not recognizing them regardless, even after I explicitly add their Unicode characters to my charset_table.

Relevant portion of config:

`min_stemming_len = 1
stopword_step = 0
html_strip  = 1
min_word_len = 1
min_infix_len = 0
index_zones = title,description
charset_type = utf8mb4_unicode_ci
charset_table = 0..9, A..Z->a..z, _, a..z, U+0026->U+0026, U+0027->U+0027, U+002E->U+002E, U+002D->U+002D, U+2014->U+002D#, U+2019->U+0027, U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024

Confirmed that the table/column is using utf8mb4_unicode_ci

Confirmed I can do a mysql search on Euro: Where Title like '%€%'

Confirmed I cannot find same record with SphinxQL: where MATCH('€')


回答1:


There are a three things you should check:

First, look at This Question to check your MySQL char encoding;

Secondly, look in your Sphinx config to check charset_type matches it.

Lastly, remember, after any changes to charset_type or charset_table you need to rebuild indexes.

If none of the above helps, you could post your Sphinx Config here, which might give further clues as to the problem.



来源:https://stackoverflow.com/questions/43381060/indexing-euro-%e2%82%ac-and-lb-%c2%a3-in-sphinx

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!