Say, I have a code which calls some function millions time from loop and I want the code to be fast:
def outer_function(file):
for line in file:
inne
If by "Python" you mean CPython, the generally used implementation, no.
If by "Python" you happened to mean any implementation of the Python language, yes. PyPy can optimise a lot and I believe its method JIT should take care of cases like this.
Which Python? PyPy's JIT-compiler will - after a few hundred or dozen (depends on how many opcodes are executed on each iteration) iterations or so - start tracing execution, forget about Python function calls along the way, and compile the gathered information into a piece of optimized machine code which likely doesn't have any remnant of the logic that made the function call itself happen. Traces are linear, the JIT's backend doesn't even know there was a function call, it just sees the instructions from both functions mixed together as they were executed. (This is the perfect case, when e.g. there is branching in the loop or all iterations take the same branch. Some code is unsuited to this kind of JIT-compilation and invalidates the traces quickly, before they yield much speedup, although this is rather rare.)
Now, CPython, what many people mean when they speak of "Python" or the Python interpreter, isn't that clever. It's a straightforward bytecode VM and will dutifully execute the logic associated with calling a function again and again in each iteration. But then again, why are you using an interpreter anyway if performance is that important? Consider writing that hot loop in native code (e.g. as a C extension or in Cython) if it's that important to keep such overhead as low as humanly possible.
Unless you're doing only a tiny bit of number crunching per iteration, you won't get large improvements either way though.
Python does not inline function calls, because of its dynamic nature. Theoretically, inner_function
can do something that re-binds the name inner_function
to something else - Python has no way to know at compile time this might happen. For example:
def func1():
global inner_func
inner_func = func2
print 1
def func2():
print 2
inner_func = func1
for i in range(5):
inner_func()
Prints:
1
2
2
2
2
You may think this is horrible. Then, think again - Python's functional and dynamic nature is one of its most appealing features. A lot of what Python allows comes at the cost of performance, and in most cases this is acceptable.
That said, you can probably hack something together using a tool like byteplay or similar - disassemble the inner function into bytecode and insert it into the outer function, then reassemble. On second thought, if your code is performance-critical enough to warrant such hacks, just rewrite it in C. Python has great options for FFI.
This is all relevant to the official CPython implementation. A runtime-JITting interpreter (like PyPy or the sadly defunct Unladen Swallow) can in theory detect the normal case and perform inlining. Alas, I'm not familiar enough with PyPy to know whether it does this, but it definitely can.
Calling a function to invoke the pass
statement obviously carries a fairly high (∞) overhead. Whether your real program suffers undue overhead depends on the size of the inner function. If it really is just setting a pixel, then I'd suggest a different approach that uses drawing primitives coded in a native language like C or C++.
There are (somewhat experimental) JIT compilers for Python that will optimise function calls, but mainstream Python won't do this.
CPython (the "standard" python implementation) doesn't do this kind of optimization.
Note however that if you are counting the CPU cycles of a function call then probably for your problem CPython is not the correct tool. If you are 100% sure that the algorithm you are going to use is already the best one (this is the most important thing), and that your computation is really CPU bound then options are for example: