Convert char to int in C and C++

前端 未结 12 1974
梦毁少年i
梦毁少年i 2020-11-22 02:41

How do I convert a char to an int in C and C++?

12条回答
  •  伪装坚强ぢ
    2020-11-22 03:14

    (This answer addresses the C++ side of things, but the sign extension problem exists in C too.)

    Handling all three char types (signed, unsigned, and char) is more delicate than it first appears. Values in the range 0 to SCHAR_MAX (which is 127 for an 8-bit char) are easy:

    char c = somevalue;
    signed char sc = c;
    unsigned char uc = c;
    int n = c;
    

    But, when somevalue is outside of that range, only going through unsigned char gives you consistent results for the "same" char values in all three types:

    char c = somevalue;
    signed char sc = c;
    unsigned char uc = c;
    // Might not be true: int(c) == int(sc) and int(c) == int(uc).
    int nc = (unsigned char)c;
    int nsc = (unsigned char)sc;
    int nuc = (unsigned char)uc;
    // Always true: nc == nsc and nc == nuc.
    

    This is important when using functions from ctype.h, such as isupper or toupper, because of sign extension:

    char c = negative_char;  // Assuming CHAR_MIN < 0.
    int n = c;
    bool b = isupper(n);  // Undefined behavior.
    

    Note the conversion through int is implicit; this has the same UB:

    char c = negative_char;
    bool b = isupper(c);
    

    To fix this, go through unsigned char, which is easily done by wrapping ctype.h functions through safe_ctype:

    template
    int safe_ctype(unsigned char c) { return F(c); }
    
    //...
    char c = CHAR_MIN;
    bool b = safe_ctype(c);  // No UB.
    
    std::string s = "value that may contain negative chars; e.g. user input";
    std::transform(s.begin(), s.end(), s.begin(), &safe_ctype);
    // Must wrap toupper to eliminate UB in this case, you can't cast
    // to unsigned char because the function is called inside transform.
    

    This works because any function taking any of the three char types can also take the other two char types. It leads to two functions which can handle any of the types:

    int ord(char c) { return (unsigned char)c; }
    char chr(int n) {
      assert(0 <= n);  // Or other error-/sanity-checking.
      assert(n <= UCHAR_MAX);
      return (unsigned char)n;
    }
    
    // Ord and chr are named to match similar functions in other languages
    // and libraries.
    

    ord(c) always gives you a non-negative value – even when passed a negative char or negative signed char – and chr takes any value ord produces and gives back the exact same char.

    In practice, I would probably just cast through unsigned char instead of using these, but they do succinctly wrap the cast, provide a convenient place to add error checking for int-to-char, and would be shorter and more clear when you need to use them several times in close proximity.

提交回复
热议问题