问题
Using MinGW 7.3.0 on Windows, Hunspell can't load the dictionary files from locations that have non-ASCII characters because of Windows limitations. I've tried everything[1] and I'm now resorting to copying the file to a path without ASCII characters before giving it to Hunspell. What is a good location to copy it to?
[1]
- Windows requires
wchar_t
support forstd::iostream.open()
to work right, which MinGW does not implement std::filesystem
can solve this, but only available in GCC 8- Hunspell insists on loading files on its own, it is not possible to pass the read files as strings to it
回答1:
The "natural" fit would be the use the user's choosen temporary directory (or subdirectory thereof) (see %temp%
or GetTempPath()). However, that defaults to something that contains the user name (which can contain "non-ASCII" characters; e.g. c:\users\Ø¥Ć¼\AppData\LocalLow\Temp
) or something arbitrary (regarding character set) all together.
So you're most likely best off to choose some directory that
a) does not contain off-limits characters from the get do. For example, a directory underneat C:\ProgramData
that you choose yourself (e.g. the application name) that you know does not contain non-ASCII characters.
b) let the user decide where to put these files and make sure it is not permissible to enter a path that contains only allowed characters.
c) Pass the "short path name" to Hunspell, which should not contain non-ASCII characters for compatibility with FAT file system traits. For example, the short path name for c:\temp\Ø¥Ć¼
is c:\temp\571D~1
.
You can see the short names for directories using cmd.exe /c dir /x
:
C:\temp>dir /x
...
19.07.2019 15:30 <DIR> .
19.07.2019 15:30 <DIR> ..
19.07.2019 15:30 <DIR> 571D~1 Ø¥Ć¼
How you can invoke the GetShortPathName
Win32 API from MinGW I don't know, but I would assume that it is possible.
Also make sure to review the MSDN page for the above function for traitoffs, e.g. short names are not supported everywhere (e.g. SMB + see comments below).
回答2:
From this bug tracker:
In WIN32 environment, use UTF-8 encoded paths started with the long path prefix \\?\ to handle system-independent character encoding and very long path names (without the long path prefix Hunspell will use fopen() with system-dependent character encoding instead of _wfopen()).
So the actual solution seems to be:
- Call GetFullPathNameW to normalize the path. Required because paths with long path prefix
\\?\
are passed to the NT API unchanged. - Prepend
L"\\\\?\\"
to the normalized path (backslashes doubled because of C string literal requirements). - For a UNC path, you have to use the "UNC" device directly (i. e.
L"\\\\server\\share"
→L"\\\\?\\UNC\\server\\share"
(thanks eryksun) - Encode the path in UTF-8, e. g. using WideCharToMultiByte() with
CP_UTF8
. - Pass the final UTF-8 encoded path to Hunspell.
回答3:
It looks like C:\Windows\Temp
is still a valid path you can write to yourself.
来源:https://stackoverflow.com/questions/57112274/windows-directory-that-will-never-contain-non-ascii-characters-for-temp-file