Special characters in the input of hunspell are treated as space

问题

This question was asked on superuser, but got only 8 views in 7 days. Hunspell knowledgeable people go to stackoverflow, hence my reasking the question here.

I am testing hunspell in the command line with a swedish dictionary. The input in the interactive mode replaces all special characters (for example å ä ö) with blanks before spell cheching.

Hunspell 1.3.2
sjögräs
& sj 15 0: SJ, aj, dj, sk, s, j, sej, sju, sjö, sjå, sa, se, ej, st, si
& gr 15 3: ge, g, r, ger, gir, gro, gör, grå, går, gry, er, nr, dr, go, kr
*

sj gr s
& sj 15 0: SJ, aj, dj, sk, s, j, sej, sju, sjö, sjå, sa, se, ej, st, si
& gr 15 3: ge, g, r, ger, gir, gro, gör, grå, går, gry, er, nr, dr, go, kr
*

As you see, the prompt's encoding is working, showing å ä and ö both in the input and the output.

Piping gives the same result:

echo sjögräs | hunspell -d sv_SE

I have tried to give different options to hunspell, including -i UTF-8, -i UTF-16, and keeping the aff file's SET ISO8859-1. Nothing worked.

The same thing happens with french:

C:\Users\gauthier>echo résultat | hunspell -d fr-moderne
Hunspell 1.3.2
*
& sultat 2 2: sultan, rAcsultat

with in addition problems with the output.

I compiled hunspell in MinGW and moved the resulting needed files to somewhere in my path, but I don't think that this information is very relevant.

How do I make hunspell recognize special characters on its input?

回答1:

By echoing the variables $LC_ALL or $LANG you can see which language and locale configuration you have on your the terminal.

Then you can try to change it to the charset hunspell by redefining those variables. For example, you can set

LC_ALL=en_US.ISO8859-15

LANG=ca_ES.cp1252

As I recall, the default character set is latin1, but I'm not sure (I'm not with Linux right now).

Try this approach instead of modifing the hunspell software.

来源：https://stackoverflow.com/questions/9787648/special-characters-in-the-input-of-hunspell-are-treated-as-space

标签

windows

spell-checking

command-prompt

hunspell