The reason is cache-local data access. In the second program you're scanning linearly through memory which benefits from caching and prefetching. Your first program's memory usage pattern is far more spread out and therefore has worse cache behavior.