I\'m having difficulty understanding the import statement and its variations.
Suppose I\'m using the lxml
module for scraping websites.
The followin
When you import
a package, the interpreter looks up the package on the pythonpath, then if found, parses and runs the package's __init__.py
, building a package object from it, and inserts that object in to sys.modules
. When importing
a module, it does the same thing, except it creates and adds a module object. When you subsequently attempt to access an attribute (aka a member method, class, submodule, or subpackage), it retrieves the corresponding object from sys.modules
and attempts a getattr
on the module or package object for the child you want. However, if the child is a submodule or subpackage that has not yet been imported
, it has not been added to sys.modules
or the module or package's attribute list, so you get an AttributeError
. Thus, you have to explicitly import a module or package, either in your code, or delegated in a package's __init__.py
for it to be available at runtime on its parent.
import lxml.html as LH
doc = LH.parse('http://somesite')
lxml.html
is a module. When you import lxml
, the html
module is not imported into the lxml
namespace. This is a developer's decision. Some packages automatically import some modules, some don't. In this case, you have to do it yourself with import lxml.html
.
import lxml.html as LH
imports the html
module and binds it to the name LH
in the current module's namespace. So you can access the parse function with LH.parse
.
If you want to delve deeper into when a package (like lxml
) imports modules (like lxml.html
) automatically, open a terminal and type
In [16]: import lxml
In [17]: lxml
Out[17]: <module 'lxml' from '/usr/lib/python2.7/dist-packages/lxml/__init__.pyc'>
Here is you see the path to the lxml
package's __init__.py
file.
If you look at the contents you find it is empty. So no submodules are imported. If you look in numpy's __init__.py
, you see lots of code, amongst which is
import linalg
import fft
import polynomial
import random
import ctypeslib
import ma
These are all submodules which are imported into the numpy
namespace. So from a user's perspective, import numpy
automatically gives you access to numpy.linalg
, numpy.fft
, etc.
Let's take an example of a package pkg
with two module in it a.py
and b.py
:
--pkg
|
| -- a.py
|
| -- b.py
|
| -- __init__.py
in __init__.py
you are importing a.py
and not b.py
:
import a
So if you open your terminal and do:
>>> import pkg
>>> pkg.a
>>> pkg.b
AttributeError: 'module' object has no attribute 'b'
As you can see because we have imported a.py
in pkg's __init__.py
, we was able to access it as an attribute of pkg
but b
is not there, so to access this later we should use:
>>> import pkg.b # OR: from pkg import b
HTH,