问题
These don't seem to index, even when I explicitly add them to my charset_table:
charset_table=... U+20AC->U+20AC, U+00A3->U+00A3
I even tried mapping them to the dollar sign
U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024
Yet in each case they are unrecognized in other words MATCH('£1000')
will not find 'cost is £1000' and if I try to map to $
as per the second example then MATCH('$1000
)` will not either.
If I do a MySQL Search however where field like '%£%'
I do get records leading me to believe the MySQL is encoding UTF-8 correctly. Meaning the Pound Sign
and Euro
characters are being stored correctly in MySQL but the Sphinx index is not recognizing them regardless, even after I explicitly add their Unicode characters to my charset_table
.
Relevant portion of config:
`min_stemming_len = 1
stopword_step = 0
html_strip = 1
min_word_len = 1
min_infix_len = 0
index_zones = title,description
charset_type = utf8mb4_unicode_ci
charset_table = 0..9, A..Z->a..z, _, a..z, U+0026->U+0026, U+0027->U+0027, U+002E->U+002E, U+002D->U+002D, U+2014->U+002D#, U+2019->U+0027, U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024
Confirmed that the table/column is using utf8mb4_unicode_ci
Confirmed I can do a mysql search on Euro: Where Title like '%€%'
Confirmed I cannot find same record with SphinxQL: where MATCH('€')
回答1:
There are a three things you should check:
First, look at This Question to check your MySQL char encoding;
Secondly, look in your Sphinx config to check charset_type
matches it.
Lastly, remember, after any changes to charset_type
or charset_table
you need to rebuild indexes.
If none of the above helps, you could post your Sphinx Config here, which might give further clues as to the problem.
来源:https://stackoverflow.com/questions/43381060/indexing-euro-%e2%82%ac-and-lb-%c2%a3-in-sphinx