Today when I was in computer organization class, teacher talked about something interesting to me. When it comes to talk about Why cache memory works, he said that:
Cache memory is very fast and very expensive memory that sits close to the CPU. Rather than fetch one small piece of data from the RAM each time, the CPU fetches a chunk of data and stores it in the cache. The bet is that if you just read one byte, then the next byte you read is likely to be right after it. If this is the case, then it can come from the cache.
By laying out your loop as you have it, you read the bytes in the order that they are stored in memory. This means that they are in the cache, and can be read very quickly by the CPU. If you swapped around lines 1 and 2, then you'd read every "N" bytes each time around the loop. The bytes you are reading are no longer consecutive in memory, and so they may not be in the cache. The CPU has to fetch them from the (slower) RAM, and so your performance decreases.