C hack for storing a bit that takes 1 bit space?

后端 未结 5 1528
误落风尘
误落风尘 2021-02-05 18:14

I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it

相关标签:
5条回答
  • 2021-02-05 18:34

    In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!

    In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.

    There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.

    Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)

    0 讨论(0)
  • 2021-02-05 18:41

    There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool

    See the doc, it stores it as efficiently as possible.

    Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset

    0 讨论(0)
  • 2021-02-05 18:47

    Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.

    When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.

    Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.

    #include<stdio.h>
    #include <stdlib.h>
    #define BYTE 8
    
    typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
    
    typedef struct bitsarr{
        unsigned char* buckets;
        numbits n;
    } bitsarr;
    
    
    bitsarr new_bitsarr(int size, numbits n)
    {
        int b = sizeof(unsigned char)*BYTE;
        int numbuckets = (size*n + b - 1)/b;
        bitsarr ret;  
        ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
        ret.n = n;
        return ret;
    }
    void bitsarr_delete(bitsarr xp)
    {
        free(xp.buckets);
    }
    
    void bitsarr_set(bitsarr *xp, int index, int value)
    {
        int buckdex, innerdex;
        buckdex = index/(BYTE/xp->n);
        innerdex = index%(BYTE/xp->n);
        xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
    
        //longer version
    
        /*unsigned int width, width_in_place, zeros, old, newbits, new;
        width = (1 << xp->n) - 1; 
        width_in_place = width << innerdex*xp->n;
        zeros = ~width_in_place;
        old = xp->buckets[buckdex];
        old = old & zeros;
        newbits = value << innerdex*xp->n;
        new = newbits | old;
        xp->buckets[buckdex] = new; */
    
    }
    
    int bitsarr_get(bitsarr *xp, int index)
    {
        int buckdex, innerdex;
        buckdex = index/(BYTE/xp->n);
        innerdex = index%(BYTE/xp->n);
        return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
    
        //longer version
    
        /*unsigned int width = (1 << xp->n) - 1; 
        unsigned int width_in_place = width << innerdex*xp->n;
        unsigned int val = xp->buckets[buckdex];
        unsigned int retshifted = width_in_place & val;
        unsigned int ret = retshifted >> innerdex*xp->n;
        return ret; */
    }
    
    int main()
    {
        bitsarr x = new_bitsarr(100, FOUR);
        for(int i = 0; i<16; i++)
            bitsarr_set(&x, i, i);
        for(int i = 0; i<16; i++)
            printf("%d\n", bitsarr_get(&x, i));
        for(int i = 0; i<16; i++)
            bitsarr_set(&x, i, 15-i);
        for(int i = 0; i<16; i++)
            printf("%d\n", bitsarr_get(&x, i));
        bitsarr_delete(x);
    }
    
    0 讨论(0)
  • 2021-02-05 18:55

    You should use std::bitset.

    std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.

    Another option is vector<bool>, which I don't recommend because:

    • It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
    • That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.

    *For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.

    0 讨论(0)
  • 2021-02-05 18:58

    If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.

    0 讨论(0)
提交回复
热议问题