Python version and Device used
I\'m following the BeautifulSoup tu
I am using BeautifulSoup 4.3.2 and OS X 10.6.8. I also have a problem with improperly installed lxml
. Here are some things that I found out:
First of all, check this related question: Removed MacPorts, now Python is broken
Now, in order to check which builders for BeautifulSoup 4 are installed, try
>>> import bs4
>>> bs4.builder.builder_registry.builders
If you don't see your favorite builder, then it is not installed, and you will see an error as above ("Couldn't find a tree builder...").
Also, just because you can import lxml
, doesn't mean that everything is perfect.
Try
>>> import lxml
>>> import lxml.etree
To understand what's going on, go to the bs4
installation and open the egg (tar -xvzf
). Notice the modules bs4.builder
. Inside it you should see files such as _lxml.py
and _html5lib.py
. So you can also try
>>> import bs4.builder.htmlparser
>>> import bs4.builder._lxml
>>> import bs4.builder._html5lib
If there is a problem, you will see, why a parricular module cannot be loaded. You can notice how at the end of builder/__init__.py
it loads all those modules and ignores whatever was not loaded:
# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only
# want to use HTMLParser as a last result.
from . import _htmlparser
register_treebuilders_from(_htmlparser)
try:
from . import _html5lib
register_treebuilders_from(_html5lib)
except ImportError:
# They don't have html5lib installed.
pass
try:
from . import _lxml
register_treebuilders_from(_lxml)
except ImportError:
# They don't have lxml installed.
pass
FWIW, I ran into a similar problem (python 3.6, os x 10.12.6) and was able to solve it simply by doing (first command is just to signify that I was working in a conda virtualenv):
$ source activate ml-general
$ pip uninstall lxml
$ pip install lxml
I tried more complicated things first, because BeautifulSoup was working correctly with an identical command through Jupyter+iPython, but not through PyCharm's terminal in the same virtualenv. Simply reinstalling lxml as above solved the problem.
If you are using Python2.7 in Ubuntu/Debian, this worked for me:
$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml
Test it like:
mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
apt-get on Debian/Ubuntu:
sudo apt-get install python3-lxml
For MacOS-X, a macport of lxml is available. Try something like
sudo port install py27-lxml
http://lxml.de/installation.html may be helpful.