问题
This question was asked on superuser, but got only 8 views in 7 days. Hunspell knowledgeable people go to stackoverflow, hence my reasking the question here.
I am testing hunspell in the command line with a swedish dictionary. The input in the interactive mode replaces all special characters (for example å ä ö) with blanks before spell cheching.
Hunspell 1.3.2
sjögräs
& sj 15 0: SJ, aj, dj, sk, s, j, sej, sju, sjö, sjå, sa, se, ej, st, si
& gr 15 3: ge, g, r, ger, gir, gro, gör, grå, går, gry, er, nr, dr, go, kr
*
sj gr s
& sj 15 0: SJ, aj, dj, sk, s, j, sej, sju, sjö, sjå, sa, se, ej, st, si
& gr 15 3: ge, g, r, ger, gir, gro, gör, grå, går, gry, er, nr, dr, go, kr
*
As you see, the prompt's encoding is working, showing å ä and ö both in the input and the output.
Piping gives the same result:
echo sjögräs | hunspell -d sv_SE
I have tried to give different options to hunspell, including -i UTF-8
, -i UTF-16
, and keeping the aff file's SET ISO8859-1
. Nothing worked.
The same thing happens with french:
C:\Users\gauthier>echo résultat | hunspell -d fr-moderne
Hunspell 1.3.2
*
& sultat 2 2: sultan, rAcsultat
with in addition problems with the output.
I compiled hunspell in MinGW and moved the resulting needed files to somewhere in my path, but I don't think that this information is very relevant.
How do I make hunspell recognize special characters on its input?
回答1:
By echoing the variables $LC_ALL
or $LANG
you can see which language and locale configuration you have on your the terminal.
Then you can try to change it to the charset hunspell
by redefining those variables. For example, you can set
LC_ALL=en_US.ISO8859-15
or
LANG=ca_ES.cp1252
As I recall, the default character set is latin1, but I'm not sure (I'm not with Linux right now).
Try this approach instead of modifing the hunspell software.
来源:https://stackoverflow.com/questions/9787648/special-characters-in-the-input-of-hunspell-are-treated-as-space