Stumped with Unicode, Boost, C++, codecvts

后端 未结 3 626
感情败类
感情败类 2020-12-28 09:42

In C++, I want to use Unicode to do things. So after falling down the rabbit hole of Unicode, I\'ve managed to end up in a train wreck of confusion, headaches and locales.

相关标签:
3条回答
  • 2020-12-28 09:45
      std::cout.imbue(convLoc);
      std::cout << data << std::endl;
    

    This does no conversion, since it uses codecvt<char, char, mbstate_t> which is a no-op. The only standard streams that use codecvt are file-streams. std::cout is not required to perform any conversion at all.

    To force Boost.Filesystem to interpret narrow-strings as UTF-8 on windows, use boost::filesystem::imbue with a locale with a UTF-8 ↔ UTF-16 codecvt facet. Boost.Locale has an implementation of the latter.

    0 讨论(0)
  • 2020-12-28 10:08

    Okay, after a long few months I've figured it out, and I'd like to help people in the future.

    First of all, the codecvt thing was the wrong way of doing it. Boost.Locale provides a simple way of converting between character sets in its boost::locale::conv namespace. Here's one example (there's others not based on locales).

    #include <boost/locale.hpp>
    namespace loc = boost::locale;
    
    int main(void)
    {
      loc::generator gen;
      std::locale blah = gen.generate("en_US.utf-32");
    
      std::string UTF8String = "Tésting!";
      // from_utf will also work with wide strings as it uses the character size
      // to detect the encoding.
      std::string converted = loc::conv::from_utf(UTF8String, blah);
    
      // Outputs a UTF-32 string.
      std::cout << converted << std::endl;
    
      return 0;
    }
    

    As you can see, if you replace the "en_US.utf-32" with "" it'll output in the user's locale.

    I still don't know how to make std::cout do this all the time, but the translate() function of Boost.Locale outputs in the user's locale.

    As for the filesystem using UTF-8 strings cross platform, it seems that that's possible, here's a link to how to do it.

    0 讨论(0)
  • 2020-12-28 10:11

    The Boost filesystem iostream replacement classes work fine with UTF-16 when used with Visual C++.

    However, they do not work (in the sense of supporting arbitrary filenames) when used with g++ in Windows - at least as of Boost version 1.47. There is a code comment explaining that; essentially, the Visual C++ standard library provides non-standard wchar_t based constructors that Boost filesystem classes make use of, but g++ does not support these extensions.

    A workaround is to use 8.3 short filenames, but this solution is a bit brittle since with old Windows versions the user can turn off automatic generation of short filenames.


    Example code for using Boost filesystem in Windows:

    #include "CmdLineArgs.h"        // CmdLineArgs
    #include "throwx.h"             // throwX, hopefully
    #include "string_conversions.h" // ansiOrFillerFrom( wstring )
    
    #include <boost/filesystem/fstream.hpp>     // boost::filesystem::ifstream
    #include <iostream>             // std::cout, std::cerr, std::endl
    #include <stdexcept>            // std::runtime_error, std::exception
    #include <string>               // std::string
    #include <stdlib.h>             // EXIT_SUCCESS, EXIT_FAILURE
    using namespace std;
    namespace bfs = boost::filesystem;
    
    inline string ansi( wstring const& ws ) { return ansiWithFillersFrom( ws ); }
    
    int main()
    {
        try
        {
            CmdLineArgs const   args;
            wstring const       programPath     = args.at( 0 );
    
            hopefully( args.nArgs() == 2 )
                || throwX( "Usage: " + ansi( programPath ) + " FILENAME" );
    
            wstring const       filePath        = args.at( 1 );
            bfs::ifstream       stream( filePath );     // Nice Boost ifstream subclass.
            hopefully( !stream.fail() )
                || throwX( "Failed to open file '" + ansi( filePath ) + "'" );
    
            string line;
            while( getline( stream, line ) )
            {
                cout << line << endl;
            }
            hopefully( stream.eof() )
                || throwX( "Failed to list contents of file '" + ansi( filePath ) + "'" );
    
            return EXIT_SUCCESS;
        }
        catch( exception const& x )
        {
            cerr << "!" << x.what() << endl;
        }
        return EXIT_FAILURE;
    }
    
    0 讨论(0)
提交回复
热议问题