You've already indicated how to do this in a standard, portable, and efficient way:
int64_t mul(int32_t x, int32_t y) {
return (int64_t)x * y;
// or static_cast(x) * y if you prefer not to use C-style casts
// or static_cast(x) * static_cast(y) if you don't want
// the integral promotion to remain implicit
}
Your question seems to be about a hypothetical architecture that has assembly instructions corresponding to the function signatures
int64_t intrinsic_mul(int32_t x, int32_t y);
int64_t intrinsic_mul(int64_t x, int64_t y);
int64_t intrinsic_mul(int64_t x, int32_t y); // and maybe this too
and, on this hypothetical architecture, the first of these has relevant advantages, and furthermore, your compiler fails to use this instruction when compiling the function above, and on top of all that, it fails to provide access to the above intrinsic.
I expect such a scenario to be really rare, but if you truly find yourself in such a situation, most compilers also allow you to write inline assembly, so you can write a function that invokes this special instruction directly, and still provides enough metadata so the optimizer can make somewhat efficient use of it (e.g. using symbolic input and output registers so the optimizer can use whichever registers it wants, rather than having the register choice hardcoded).