Explain the use of a bit vector for determining if all characters are unique

前端 未结 12 1780
野性不改
野性不改 2020-12-04 04:23

I am confused about how a bit vector would work to do this (not too familiar with bit vectors). Here is the code given. Could someone please walk me through this?

         


        
相关标签:
12条回答
  • 2020-12-04 04:59
    public static void main (String[] args)
    {
        //In order to understand this algorithm, it is necessary to understand the following:
    
        //int checker = 0;
        //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
        //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with
    
        //int val = str.charAt(i) - 'a';
        //In order to understand what is going on here, we must realize that all characters have a numeric value
        for (int i = 0; i < 256; i++)
        {
            char val = (char)i;
            System.out.print(val);
        }
    
        //The output is something like:
        //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
        //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead
    
        //To only print the characters from 'a' on forward:
        System.out.println();
        System.out.println();
    
        for (int i=0; i < 256; i++)
        {
            char val = (char)i;
            //char val2 = val + 'a'; //incompatible types. required: char found: int
            int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
            char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
            System.out.print(val3);
        }
    
        //Notice how the following does not work:
        System.out.println();
        System.out.println();
    
        for (int i=0; i < 256; i++)
        {
            char val = (char)i;
            int val2 = val - 'a';
            char val3 = (char)val2;
            System.out.print(val3);
        }
        //I'm not sure why this spills out into 2 lines:
        //EDIT I cant seem to copy this into stackoverflow!
    
        System.out.println();
        System.out.println();
    
        //So back to our original algorithm:
        //int val = str.charAt(i) - 'a';
        //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems
    
        //if ((checker & (1 << val)) > 0) return false;
        //This line is quite a mouthful, lets break it down:
        System.out.println(0<<0);
        //00000000000000000000000000000000
        System.out.println(0<<1);
        //00000000000000000000000000000000
        System.out.println(0<<2);
        //00000000000000000000000000000000
        System.out.println(0<<3);
        //00000000000000000000000000000000
        System.out.println(1<<0);
        //00000000000000000000000000000001
        System.out.println(1<<1);
        //00000000000000000000000000000010 == 2
        System.out.println(1<<2);
        //00000000000000000000000000000100 == 4
        System.out.println(1<<3);
        //00000000000000000000000000001000 == 8
        System.out.println(2<<0);
        //00000000000000000000000000000010 == 2
        System.out.println(2<<1);
        //00000000000000000000000000000100 == 4
        System.out.println(2<<2);
        // == 8
        System.out.println(2<<3);
        // == 16
        System.out.println("3<<0 == "+(3<<0));
        // != 4 why 3???
        System.out.println(3<<1);
        //00000000000000000000000000000011 == 3
        //shift left by 1
        //00000000000000000000000000000110 == 6
        System.out.println(3<<2);
        //00000000000000000000000000000011 == 3
        //shift left by 2
        //00000000000000000000000000001100 == 12
        System.out.println(3<<3);
        // 24
    
        //It seems that the -  'a' is not necessary
        //Back to if ((checker & (1 << val)) > 0) return false;
        //(1 << val means we simply shift 1 by the numeric representation of the current character
        //the bitwise & works as such:
        System.out.println();
        System.out.println();
        System.out.println(0&0);    //0
        System.out.println(0&1);       //0
        System.out.println(0&2);          //0
        System.out.println();
        System.out.println();
        System.out.println(1&0);    //0
        System.out.println(1&1);       //1
        System.out.println(1&2);          //0
        System.out.println(1&3);             //1
        System.out.println();
        System.out.println();
        System.out.println(2&0);    //0
        System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
        System.out.println(2&2);          //2  0010 & 0010 == 2
        System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
        System.out.println();
        System.out.println();
        System.out.println(3&0);    //0    0011 & 0000 == 0
        System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
        System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
        System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
        System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!
    
        //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97
    
        //why is it that the result of bitwise & is > 0 means its a dupe?
        //lets see..
    
        //0011 & 0011 is 0011 means its a dupe
        //0000 & 0011 is 0000 means no dupe
        //0010 & 0001 is 0011 means its no dupe
        //hmm
        //only when it is all 0000 means its no dupe
    
        //so moving on:
        //checker |= (1 << val)
        //the |= needs exploring:
    
        int x = 0;
        int y = 1;
        int z = 2;
        int a = 3;
        int b = 4;
        System.out.println("x|=1 "+(x|=1));  //1
        System.out.println(x|=1);     //1
        System.out.println(x|=1);      //1
        System.out.println(x|=1);       //1
        System.out.println(x|=1);       //1
        System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
        System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
        System.out.println(y);  //should be 3?? 
        System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
        System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
        System.out.println(y|=3); //0011 |= 0011, still 3
        System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
        System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
        System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!
    
        //so the |= is just a bitwise OR!
    }
    
    public static boolean isUniqueChars(String str) {
        int checker = 0;
        for (int i = 0; i < str.length(); ++i) {
            int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
            if ((checker & (1 << val)) > 0) return false;
            checker |= (1 << val);
        }
        return true;
    }
    
    public static boolean is_unique(String input)
    {
        int using_int_as_32_flags = 0;
        for (int i=0; i < input.length(); i++)
        {
            int numeric_representation_of_char_at_i = input.charAt(i);
            int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
            int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
            boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
            if (already_bit_flagged)
                return false;
            using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
        }
        return true;
    }
    
    0 讨论(0)
  • 2020-12-04 05:00

    Lets break down the code line by line.

    int checker = 0; We are initiating a checker which will help us find duplicate values.

    int val = str.charAt(i) - 'a'; We are getting the ASCII value of the character at the 'i'th position of the string and subtracting it with the ASCII value of 'a'. Since the assumption is that the string is lower characters only, the number of characters in limited to 26. Hece, the value of 'val' will always be >= 0.

    if ((checker & (1 << val)) > 0) return false;

    checker |= (1 << val);

    Now this is the tricky part. Lets us consider an example with string "abcda". This should ideally return false.

    For loop iteration 1:

    Checker: 00000000000000000000000000000000

    val: 97-97 = 0

    1 << 0: 00000000000000000000000000000001

    checker & (1 << val): 00000000000000000000000000000000 is not > 0

    Hence checker: 00000000000000000000000000000001

    For loop iteration 2:

    Checker: 00000000000000000000000000000001

    val: 98-97 = 1

    1 << 0: 00000000000000000000000000000010

    checker & (1 << val): 00000000000000000000000000000000 is not > 0

    Hence checker: 00000000000000000000000000000011

    For loop iteration 3:

    Checker: 00000000000000000000000000000011

    val: 99-97 = 0

    1 << 0: 00000000000000000000000000000100

    checker & (1 << val): 00000000000000000000000000000000 is not > 0

    Hence checker: 00000000000000000000000000000111

    For loop iteration 4:

    Checker: 00000000000000000000000000000111

    val: 100-97 = 0

    1 << 0: 00000000000000000000000000001000

    checker & (1 << val): 00000000000000000000000000000000 is not > 0

    Hence checker: 00000000000000000000000000001111

    For loop iteration 5:

    Checker: 00000000000000000000000000001111

    val: 97-97 = 0

    1 << 0: 00000000000000000000000000000001

    checker & (1 << val): 00000000000000000000000000000001 is > 0

    Hence return false.

    0 讨论(0)
  • 2020-12-04 05:01

    Simple Explanation (with JS code below)

    • An integer variable per machine code is a 32-bit array
    • All bit wise operations are 32-bit
    • They're agnostic of OS / CPU architecture or chosen number system of the language, e.g. DEC64 for JS.
    • This duplication finding approach is similar to storing characters in an array of size 32 where, we set 0th index if we find a in the string, 1st for b & so on.
    • A duplicate character in the string will have its corresponding bit occupied, or, in this case, set to 1.
    • Ivan has already explained: How this index calculation works in this previous answer.

    Summary of operations:

    • Perform AND operation between checker & index of the character
    • Internally both are Int-32-Arrays
    • It is a bit-wise operation between these 2.
    • Check if the output of the operation was 1
    • if output == 1
      • The checker variable has that particular index-th bit set in both arrays
      • Thus it's a duplicate.
    • if output == 0
      • This character hasn't been found so far
      • Perform an OR operation between checker & index of the character
      • Thereby, updating the index-th bit to 1
      • Assign the output to checker

    Assumptions:

    • We've assumed we'll get all lower case characters
    • And, that size 32 is enough
    • Hence, we began our index counting from 96 as reference point considering the ascii code for a is 97

    Given below is the JavaScript source code.

    function checkIfUniqueChars (str) {
    
        var checker = 0; // 32 or 64 bit integer variable 
    
        for (var i = 0; i< str.length; i++) {
            var index = str[i].charCodeAt(0) - 96;
            var bitRepresentationOfIndex = 1 << index;
    
            if ( (checker & bitRepresentationOfIndex) > 1) {
                console.log(str, false);
                return false;
            } else {
                checker = (checker | bitRepresentationOfIndex);
            }
        }
        console.log(str, true);
        return true;
    }
    
    checkIfUniqueChars("abcdefghi");  // true
    checkIfUniqueChars("aabcdefghi"); // false
    checkIfUniqueChars("abbcdefghi"); // false
    checkIfUniqueChars("abcdefghii"); // false
    checkIfUniqueChars("abcdefghii"); // false
    

    Note that in JS, despite integers being of 64 bits, a bit wise operation is always done on 32 bits.

    Example: If the string is aa then:

    // checker is intialized to 32-bit-Int(0)
    // therefore, checker is
    checker= 00000000000000000000000000000000
    

    i = 0

    str[0] is 'a'
    str[i].charCodeAt(0) - 96 = 1
    
    checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
    Boolean(0) == false
    
    // So, we go for the '`OR`' operation.
    
    checker = checker OR 32-bit-Int(1)
    checker = 00000000000000000000000000000001
    

    i = 1

    str[1] is 'a'
    str[i].charCodeAt(0) - 96 = 1
    
    checker= 00000000000000000000000000000001
    a      = 00000000000000000000000000000001
    
    checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
    Boolean(1) == true
    // We've our duplicate now
    
    0 讨论(0)
  • Previous Posts explain well what the code block does and i want to add my simple Solution using the BitSet java Data structure :

    private static String isUniqueCharsUsingBitSet(String string) {
      BitSet bitSet =new BitSet();
        for (int i = 0; i < string.length(); ++i) {
            int val = string.charAt(i);
            if(bitSet.get(val)) return "NO";
            bitSet.set(val);
        }
      return "YES";
    }
    
    0 讨论(0)
  • 2020-12-04 05:09
    Line 1:   public static boolean isUniqueChars(String str) {
    Line 2:      int checker = 0;
    Line 3:      for (int i = 0; i < str.length(); ++i) {
    Line 4:          int val = str.charAt(i) - 'a';
    Line 5:          if ((checker & (1 << val)) > 0) return false;
    Line 6:         checker |= (1 << val);
    Line 7:      }
    Line 8:      return true;
    Line 9:   }
    

    The way I understood using Javascript. Assuming input var inputChar = "abca"; //find if inputChar has all unique characters

    Lets Start

    Line 4: int val = str.charAt(i) - 'a';

    Above line Finds Binary value of first character in inputChar which is a, a = 97 in ascii, then convert 97 to binary becomes 1100001.

    In Javascript Eg: "a".charCodeAt().toString(2) returns 1100001

    checker = 0 // binary 32 bit representation = 0000000000000000000000000

    checker = 1100001 | checker; //checker becomes 1100001 (In 32 bit representation it becomes 000000000.....00001100001)

    But i want my bitmask (int checker) to set only one bit, but checker is 1100001

    Line 4:          int val = str.charAt(i) - 'a';
    

    Now above code comes handy. I just subtract 97 always (ASCII val of a)

    val = 0; // 97 - 97  Which is  a - a
    val = 1; // 98 - 97 Which is b - a
    val = 1;  // 99 - 97 Which is c - a
    

    Lets use val which is resetted

    Line 5 and Line 6 is well explained @Ivan answer

    0 讨论(0)
  • 2020-12-04 05:12

    I think all these answers do explain how this works, however i felt like giving my input on how i saw it better, by renaming some variables, adding some others and adding comments to it:

    public static boolean isUniqueChars(String str) {
    
        /*
        checker is the bit array, it will have a 1 on the character index that
        has appeared before and a 0 if the character has not appeared, you
        can see this number initialized as 32 0 bits:
        00000000 00000000 00000000 00000000
         */
        int checker = 0;
    
        //loop through each String character
        for (int i = 0; i < str.length(); ++i) {
            /*
            a through z in ASCII are charactets numbered 97 through 122, 26 characters total
            with this, you get a number between 0 and 25 to represent each character index
            0 for 'a' and 25 for 'z'
    
            renamed 'val' as 'characterIndex' to be more descriptive
             */
            int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26
    
            /*
            created a new variable to make things clearer 'singleBitOnPosition'
    
            It is used to calculate a number that represents the bit value of having that 
            character index as a 1 and the rest as a 0, this is achieved
            by getting the single digit 1 and shifting it to the left as many
            times as the character index requires
            e.g. character 'd'
            00000000 00000000 00000000 00000001
            Shift 3 spaces to the left (<<) because 'd' index is number 3
            1 shift: 00000000 00000000 00000000 00000010
            2 shift: 00000000 00000000 00000000 00000100
            3 shift: 00000000 00000000 00000000 00001000
    
            Therefore the number representing 'd' is
            00000000 00000000 00000000 00001000
    
             */
            int singleBitOnPosition = 1 << characterIndex;
    
            /*
            This peforms an AND between the checker, which is the bit array
            containing everything that has been found before and the number
            representing the bit that will be turned on for this particular
            character. e.g.
            if we have already seen 'a', 'b' and 'd', checker will have:
            checker = 00000000 00000000 00000000 00001011
            And if we see 'b' again:
            'b' = 00000000 00000000 00000000 00000010
    
            it will do the following:
            00000000 00000000 00000000 00001011
            & (AND)
            00000000 00000000 00000000 00000010
            -----------------------------------
            00000000 00000000 00000000 00000010
    
            Since this number is different than '0' it means that the character
            was seen before, because on that character index we already have a 
            1 bit value
             */
            if ((checker & singleBitOnPosition) > 0) {
                return false;
            }
    
            /* 
            Remember that 
            checker |= singleBitOnPosition is the same as  
            checker = checker | singleBitOnPosition
            Sometimes it is easier to see it expanded like that.
    
            What this achieves is that it builds the checker to have the new 
            value it hasnt seen, by doing an OR between checker and the value 
            representing this character index as a 1. e.g.
            If the character is 'f' and the checker has seen 'g' and 'a', the 
            following will happen
    
            'f' = 00000000 00000000 00000000 00100000
            checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001
    
            00000000 00000000 00000000 00100000
            | (OR)
            00000000 00000000 00000000 01000001
            -----------------------------------
            00000000 00000000 00000000 01100001
    
            Therefore getting a new checker as 00000000 00000000 00000000 01100001
    
             */
            checker |= singleBitOnPosition;
        }
        return true;
    }
    
    0 讨论(0)
提交回复
热议问题