Why do logicals (booleans) in R require 4 bytes?

后端未结

关注

 3  1459

For a vector of logical values, why does R allocate 4 bytes, when a bit vector would consume 1 bit per entry? (See this question for examples.)

Now, I realize that

相关标签:

3条回答

南旧

2020-12-08 10:20

Other answers have gotten at the (likely) architectural reasons that logical vectors are implemented taking the same space as integers. I wanted to point out the bit package which implements a one-bit (no NA) logical.

0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-12-08 10:28

Knowing a little something about R and S-Plus, I'd say that R most likely did it to be compatible with S-Plus, and S-Plus most likely did it because it was the easiest thing to do...

Basically, a logical vector is identical to an integer vector, so sum and other algorithms for integers work pretty much unchanged on logical vectors.

In 64-bit S-Plus, the integers are 64-bit and thus also the logical vectors! That's 8 bytes per logical value...

@Iterator is of course correct that a logical vector should be represented in a more compact form. Since there is already a raw vector type which is 1-byte, it would seem like a very simple change to use that one for logicals too. And 2 bits per value would of course be even better - I'd probably keep them as two separate bit vectors (TRUE/FALSE and NA/Valid), and the NA bit vector could be NULL if there are no NAs...

Anyway, that's mostly a dream since there are so many RAPI packages (packages that use the R C/FORTRAN APIs) out there that would break...

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-12-08 10:42
Without knowing R at all, I suspect for much the same reason as C does, because it's way faster to load a size equal to the processors native word size.

Loading a single bit would be slow, especially from a bitfield since you'd have to mask out the bits that do not apply to your particular query. With a whole word you can just load it in a registry and be done with it. Since the size difference usually is not a problem the default implementation is to use a word sized variable. If the user wants something else there is always the option to do the bit-shifting required manually.

Concerning packing, at least on some processors it will cause a fault to read from a non-aligned address. So while you might declare a structure with a single byte in it surrounded by two int the byte might be padded to be 4 bytes in size regardless. Again, I don't know anything about R in particular, but I suspect the behaviour might be the same for performance reasons.

Addressing a single byte in an array is quite more involved, say you have an array bitfield and want to address bit x in it, the code would be something like this:
```
bit b = (bitfield[x/8] >> (x % 8)) & 1
```
to obtain either 0 or 1 for the bit you requested. In comparison to the straightforward array addressing of from a boolean array obtaining value number x: bool a = array[x]
0 讨论(0)
发布评论:

提交评论
- 加载中...