The descriptions of bitCount() and bitLength() are rather cryptic:
public int bitCount()
Returns the number of bits in th
A quick demonstration:
public void test() {
BigInteger b = BigInteger.valueOf(0x12345L);
System.out.println("b = " + b.toString(2));
System.out.println("bitCount(b) = " + b.bitCount());
System.out.println("bitLength(b) = " + b.bitLength());
}
prints
b = 10010001101000101
bitCount(b) = 7
bitLength(b) = 17
So, for positive integers:
bitCount()
returns the number of set bits in the number.
bitLength()
returns the position of the highest set bit i.e. the length of the binary representation of the number (i.e. log2).
Another basic function is missing:
There are efficient ways to:
In my opinion the 3rd problem requires a more efficient storage for bitsets than a flat array of words: we need a representation using a binary tree instead:
Suppose you want to store 64 bits in a bitset
Another common usage of bitsets is to allow them to represent their complement, but this is not easy when the number of integers that the set could have as members if very large (e.g. to represent a set of 64-bit integers): the bitset should then reserve at least one bit to indicate that the bitsets does NOT store directly the integers that are members of the set, but instead store only the integers that are NOT members of the set.
And an efficient representation of the bitset using a tree-like structure should allow each node in the binary tree to choose if it should store the members or the non-members, depending on the cardinality of members in each subrange (each subrange will represent a subset of all integers between k and (k+2^n-1), where k is the node number in the binary tree, each node storing a single word of n bits; one of these bits storing if the word contains members or non-members).
There's an efficient way to store binary trees in a flat indexed array, if the tree is dense enough to have few words set with bits all set to 0 or all set to 1. If this is not the case (for very "sparse" sets), you need something else using pointers like a B-tree, where each page of the B-tree can be either a flat "dense" range, or an ordered index of subtrees: you'll store flat dense ranges in leaf nodes which can be allocated in a flat array, and you'll sore other nodes separately in another store that can also be an array: instead of a pointer from one node to the other for a subbranch of the btree, you use an index in that array; the index itself can have one bit indicating if you are pointing to other pages of branches, or to a leaf node.
But the current default implementation of bitsets in Java collections does not use these technics, so BitSets are still not efficient enough to store very sparse sets of large integers. You need your own library to reduce the storage requirement and still allow fast lookup in the bitset, in O(log2(N)) time, to determine if an integer is a member or not of the set of integers represented by this optimized bitset.
But anyway the default Java implementation is sufficient if you just need bitCount() and bitLength() and your bitsets are used for dense sets, for sets of small integers (for a set of 16-bit integers, a naive approach storing 64K bit, i.e. using 8KB of memory at most, is generally enough).
For very sparse sets of large integers, it will always be more efficient to just store a sorted array of integer values (e.g. not more than one bit every 128 bits), or a hashed table if the bit set would not set more than 1 bit for every range of 32 bits: you can still add an extra bit in these structures to store the "complement" bit.
But I've not found that getLowestSetBit() was efficient enough: the BigInteger package still cannot support very sparse bitsets without huge memory costs, even if BigInteger can be used easility to represent the "complement" bit just as a "sign bit" with its signum() and substract methods, which are efficient.
Very large and very sparse bitsets are needed for example for somme wellknown operations, like searches in large very databases of RDF tuples in a knowledge database, each tuple being indexed by a very large GUID (represented by 128-bit integers): you need to be able to perform binary operations like unions, differences, and complements.