The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand
Given that iconv
is not "pure standard C/C++", I don't think you are satisfying your own specifications.
There are new codecvt
facets coming with char32_t
and char16_t
so I don't see how you can be wrong as long as you are consistent and pick one char type + encoding if the facets are here.
The facets are described in 22.5 [locale.stdcvt] (from n3242).
I don't understand how this doesn't satisfy at least some of your requirements:
namespace ns {
typedef char32_t char_t;
using std::u32string;
// or use user-defined literal
#define LIT u32
// Communicate with interface0, which wants utf-8
// This type doesn't need to be public at all; I just refactored it.
typedef std::wstring_convert, char_T> converter0;
inline std::string
to_interface0(string const& s)
{
return converter0().to_bytes(s);
}
inline string
from_interface0(std::string const& s)
{
return converter0().from_bytes(s);
}
// Communitate with interface1, which wants utf-16
// Doesn't have to be public either
typedef std::wstring_convert, char_T> converter1;
inline std::wstring
to_interface0(string const& s)
{
return converter1().to_bytes(s);
}
inline string
from_interface0(std::wstring const& s)
{
return converter1().from_bytes(s);
}
} // ns
Then your code can use ns::string
, ns::char_t
, LIT'A'
& LIT"Hello, World!"
with reckless abandon, without knowing what's the underlying representation. Then use from_interfaceX(some_string)
whenever it's needed. It doesn't affect the global locale or streams either. The helpers can be as clever as needed, e.g. codecvt_utf8
can deal with 'headers', which I assume is Standardese from tricky stuff like the BOM (ditto codecvt_utf16
).
In fact I wrote the above to be as short as possible but you'd really want helpers like this:
template
inline ns::string
ns::from_interface0(T&&... t)
{
return converter0().from_bytes(std::forward(t)...);
}
which give you access to the 3 overloads for each [from|to]_bytes
members, accepting things like e.g. const char*
or ranges.