A smart C compiler can probably optimize your loop away by recognizing that at the end, a
will always be 1. Python can't do that because when iterating over xrange
, it needs to call __next__
on the xrange
object until it raises StopIteration
. python can't know if __next__
will have side-effect until it calls it, so there is no way to optimize the loop away. The take-away message from this paragraph is that it is MUCH HARDER to optimize a Python "compiler" than a C compiler because python is such a dynamic language and requires the compiler to know how the object will behave in certain circumstances. In
C, that's much easier because C knows exactly what type every object is ahead of time.
Of course, compiler aside, python needs to do a lot more work. In C
, you're working with base types using operations supported in hardware instructions. In python, the interpreter is interpreting the byte-code one line at a time in software. Clearly that is going to take longer than machine level instructions. And the data model (e.g. calling __next__
over and over again) can also lead to a lot of function calls which the C doesn't need to do. Of course, python does this stuff to make it much more flexible than you can have in a compiled language.
The typical way to speed up python code is to use libraries or intrinsic functions which provide a high level interface to low-level compiled code. scipy
and numpy
are excellent examples this kind of library. Other things you can look into are using pypy which includes a JIT compiler -- you probably won't reach native speeds, but it'll probably beat Cpython (the most common implementation), or writing extensions in C/fortran using the Cpython-API, cython or f2py for performance critical sections of code.