This phenomenon is found when I programmed for the LeetCode problem N-Queens.
I have two versions of accepted code, the only difference between which is the way I stored
Access to single bits is usually slower than to complete addressable units (bytes in the lingo of C++). For example, to write a byte, you just issue a write instruction (mov on x86). To write a bit, you need to load the byte containing it, use bitwise operators to set the right bit within the byte, and then store the resulting byte.
The compact size of a bit vector is nice for storage requirements, but it will result in a slowdown except when your data becomes large enough that caching issues play a role.
If you want to have speed and still be more efficient than 4 bytes per value, try a vector