I\'ve had a habit of using int to access arrays (especially in for loops); however I recently discovered that I may have been \"doing-it-all-wrong\" and my x86 system kept h
clang and gcc have -Wchar-subscripts
, but that'll only help detect char
subscript types.
You might consider modifying clang or gcc (whichever is easier to build on your infrastructure) to broaden the types detected by the -Wchar-subscripts
warning. If this is a one-pass fix effort, this might be the most straightforward way to go about it.
Otherwise you'll need to find a linter that complains about non-size_t
/ptrdiff_t
subscripting; I'm not aware of any that have that option.
The movslq
instruction sign-extends a long
(aka 4-byte quantity) to a quad
(aka 8-byte quantity). This is because int
is signed, so an offset of i.e. -1
is 0xffffffff
as a long. If you were to just zero-extend that (i.e. not have movslq
), this would be 0x00000000ffffffff
, aka 4294967295
, which is probably not what you want. So, the compiler instead sign-extends the index to yield 0xffff...
, aka -1
.
The reason the other types don't require the additional operation is because, despite some of them being signed, they're still the same size of 8 bytes. And, thanks to two's complement, 0xffff...
can be interpreted as either -1
or 18446744073709551615
, and the 64-bit sum will still be the same.
Now, normally, if you were to instead use unsigned int
, the compiler would normally have to insert a zero-extend instead, just to make sure the upper-half of the register doesn't contain garbage. However, on the x64 platform, this is done implicitly; an instruction such as mov %eax,%esi
will move whatever 4-byte quantity is in eax
into the lower 4 bytes of rsi
and clear the upper 4, effectively zero-extending the quantity. But, given your postings, the compiler seems to insert mov %esi,%esi
instruction anyway, "just to be sure".
Note, however, that this "automatic zero-extending" is not the case for 1- and 2-byte quantities - those must be manually zero-extended.