Testing if an integer is an uppercase ASCII letter using bit manipulation

后端 未结 4 733
南方客
南方客 2021-01-28 11:35

For an assignment, I\'m trying to make some code in C that uses only bit manipulation to test if an integer is an ASCII uppercase letter. The letter will be given by its ASCII c

相关标签:
4条回答
  • 2021-01-28 12:09

    Since OP is stuck on case 0x7fffffff, exclude it by extending the otherwise working solution.

    !((~(((x & 32)>>5))<<31))>>31) & !(x ^ 0x7fffffff)
    

    Pedantically, just code as below and let the compiler simplify.

    isupper = (!(x ^ 'A')) | (!(x ^ 'B')) | (!(x ^ 'C')) ... (!(x ^ 'Z'));
    
    0 讨论(0)
  • 2021-01-28 12:28

    You could use unsigned integer division, if that's allowed:

    !((x-0x41)/26)
    

    But that's probably not in the spirit of the original question. Consider what happens when you subtract 0x3B from any upper case letter:

    A: 0x41 - 0x3B = 0x06
    Z: 0x5A - 0x3B = 0x1F
    

    The interesting feature here is that any value initially larger than 0x5A will have one of the high bits set (~0x1F). You can perform the same shifting for moving 'A' down to zero, so anything initially less than 'A' would have the high bits set. In the end a solution requires only subtractions, an or, and some bit-wise ands:

    !(((x-0x3B) & ~(0x1F)) || ((x-0x41) & ~(0x1F)))
    

    I believe that does what you want. Given the nature of conditional (short circuit) evaluation in C, this has an implicit conditional embedded in it though. If you want to remove that, minimize the computation, and maximize the obscurity you could do this:

    !(((x-0x3B) | (x-0x41)) & ~(0x1F))
    

    or my new personal favorite:

    !((('Z'-x) | (x-'A')) & ~(0x1F))
    
    0 讨论(0)
  • 2021-01-28 12:32

    You can test if an ASCII letter c is upper case by checking its 0x20 bit, it must be 0 for uppercase and 1 for lowercase:

    if (!(c & 0x20))
        printf("ASCII letter %c is uppercase\n", c);
    

    but be aware that this test does not work if you don't already know that c is a letter. It would erroneously match '@' and '[', '\\', ']', '^' and '_', and the whole range of characters with the high bit set from 192 to 223, which are not part of ASCII but are valid unsigned char values.

    If you want a single test to verify if c is an uppercase ASCII letter, try:

    if ((unsigned)(c - 'A') <= (unsigned)('Z' - 'A'))
         printf("%c is an uppercase ASCII letter\n", c);
    

    EDIT: it is unclear what you mean by I am not allowed to use if statements, or any kind of type casting operations. I must test to see if the number is between the two numbers, including numbers far outside the range of the ASCII code, and return 1 if it is or else 0.

    • If you know c is a letter, both !(c & 0x20) and (((c >> 5) & 1) ^ 1) will have value 1 if c is uppercase and 0 if not.
    • If c can be any integer value, just write the regular comparison (c >= 'A' && c <= 'Z') and the compiler will produce better code than you would by attempting hazardous bit-twiddling tricks.

    EDIT again:

    Since c can be any integer value and you are only allowed bit manipulations, here is another solution: !((c >> 5) ^ 2) & (0x07fffffeU >> (c & 31)). Below is a program to test this:

    #include <stdio.h>
    #include <stdlib.h>
    
    static int uppertest(int c) {
        return !((c >> 5) ^ 2) & (0x07fffffeU >> (c & 31));
    }
    
    int main(int argc, char *argv[]) {
        for (int i = 1; i < argc; i++) {
            int c = strtol(argv[i], NULL, 0);
            printf("uppertest(%d) -> %d\n", c, uppertest(c));
        }
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-28 12:33

    ... to see if a letter is uppercase

    Simplification: Let us assume ranges [A-Z] and [a-z] char differ by the same value which is a power of 2. So 'B'-'b' equals 'X'-'x', etc.

    #define CASE_MASK ('A' ^ 'a')
    
    // Is letter uppercase?
    int is_letter_upper(int ch) {
       return (ch & CASE_MASK) == ('A' & CASE_MASK);
    }
    
    // Is letter lowercase?
    int is_letter_lower(int ch) {
       return (ch & CASE_MASK) == ('a' & CASE_MASK);
    }
    

    This works for ASCII and EBCIDIC

    A more "bit manipulation" answer

    int is_letter_upperBM(int ch) {
       return !((ch & CASE_MASK) ^ ('A' & CASE_MASK));
    }
    
    0 讨论(0)
提交回复
热议问题