User defined literals must start with an underscore.
This is a more or less universally well-known rule that you can find on every layman-worded site talkin
Is every “normal” use of user-defined literals undefined behavior?
Clearly not.
The following is the idiomatic (and thus definitely “normal”) use of UDLs, and it’s well-defined according to the rule you’ve just listed:
namespace si {
struct metre { … };
constexpr metre operator ""_m(long double value) { return metre{value}; }
}
You’ve listed problematic cases and I agree with your assessment about their validity but they’re easily avoided in idiomatic C++ code so I don’t entirely see the problem with the current wording, even if it was potentially accidental.
According to the example in [over.literal]/8, we can even use capital letters after the underscore:
float operator ""E(const char*); // error: reserved literal suffix (20.5.4.3.5, 5.13.8) double operator""_Bq(long double); // OK: does not use the reserved identifier _Bq (5.10) double operator"" _Bq(long double); // uses the reserved identifier _Bq (5.10)
The only problematic thing thus seems to be the fact that the standard makes the whitespace between ""
and the UDL name significant.
Given the literal with suffix _X
, the grammar calls _X an "identifier".
So, yes: the standard has, presumably inadvertently, made it impossible to create a UDT at global scope, or UDTs that start with a capital letter, in a well-defined program. (Note that the former is not something you generally want to do anyway!)
This cannot be resolved editorially: the names of user-defined literals would have to have their own lexical "namespace" that prevented clashes with (for example) names of implementation-provided functions. In my opinion, though, it would have been nice for there to be a non-normative note somewhere, pointing out the consequences of these rules and pointing out that they are deliberate.
Yes, defining your own user defined literal in the global namespace results in an ill-formed program.
I haven't run into this myself, because I try to follow the rule:
Don't put anything (besides main
, namespaces, and extern "C"
stuff for ABI stability) in the global namespace.
namespace Mine {
struct meter { double value; };
inline namespace literals {
meter operator ""_m( double v ) { return {v}; }
}
}
int main() {
using namespace Mine::literals;
std::cout << 15_m.value << "\n";
}
This also means you cannot use _CAPS
as your literal name, even in a namespace.
Inline namespaces called literals
is a great way to package up your user defined literal operators. They can be imported where you want to use it without having to name exactly which literals you want, or if you import the entire namespace you also get the literals.
This follows how the std
library handles literals as well, so should be familiar to users of your code.
Yes: the combination of forbidding the use of _
as the start of a global identifier coupled with requiring non-standard UDLs to start with _
means that you can't put them in the global namespace. But you shouldn't be dirtying up the global namespace with stuff, especially UDLs, so that shouldn't be much of a problem.
The traditional idiom, as used by the standard, is to put UDLs in a literals
namespace (and if you have different sets of UDLs, then you put them in different inline namespaces
below that namespace). That literals
namespace is typically underneath your main one. When you want to use a particular set of UDLs, you invoke using namespace my_namespace::literals
or whichever sub-namespace contains your literal set of choice.
This is important because UDLs tend to be heavily abbreviated. The standard for example uses s
for std::string
, but also for std::chrono::duration
of seconds. While they do apply to different kinds of literals (s
applied to a string is a string, while s
applied to a number is a duration), it can sometimes be confusing to read code that uses abbreviated literals. So you shouldn't throw literals at all users of your library; they should opt-in to using them.
By using different namespaces for these (std::literals::string_literals
and std::literals::chrono_literals
), the user can be up-front about which sets of literals they want in which parts of code.
This is a good question, and I'm not sure about the answer, but I think the answer is "no, it's not UB" based on a particular reading of the standard.
[lex.name]/3.2 reads:
Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
Now, clearly, the restriction "as a name in the global namespace" should be read as applying to the entire rule, not just to how the implementation may use the name. That is, its meaning is not
"each identifier that begins with an underscore is reserved to the implementation, AND the implementation may use such identifiers as names in the global namespace"
but rather,
"the use of any identifier that begins with an underscore as a name in the global namespace is reserved to the implementation".
(If we believed the first interpretation, then it would mean that no one could declare a function called my_namespace::_foo
, for example.)
Under the second interpretation, something like a global declaration of operator""_foo
(in the global scope) is legal, because such a declaration does not use _foo
as a name. Rather, the identifier is just a part of the actual name, which is operator""_foo
(which does not start with an underscore).