PEP 08 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.<
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
Module importing is quite fast, but not instant. This means that:
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
__init__.py
of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib
's lazy-loading framework.It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1