I'm building an application using python which involves getting news articles from RSS feeds. As part of my project, I have decided to use boilerpipe in order to extract just the article content from the html page on which the article appears.
Although boilerpipe was originally written for java, it has been ported to python too. You can see its page on github here: https://github.com/misja/python-boilerpipe
The problem is that I get an exception when trying to import it using:
from boilerpipe.extract import Extractor
The error I get is:
Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\boilerpipe\extract__init__.py", line 12, in
File "C:\Python26\lib\site-packages\jpype_jclass.py", line 54, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
jpype._jexception.ExceptionPyRaisable: java.lang.Exception: Class
de.l3s.boilerpipe.sax.HTMLHighlighter not found
What might be causing this problem and how can I fix it?
This worked for me on Mac OS X 10.8.5 with Python 2.7.9.:
pip install JPype1 # to install https://pypi.python.org/pypi/JPype1
pip install charade
git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install
Then you should be able to do in the python console
>>> from boilerpipe.extract import Extractor
>>> extractor = Extractor(extractor='ArticleExtractor', url="http://en.wikipedia.org/wiki/Main_Page")
>>> print extractor.getText()
You are missing boiler pipe java packages install, you can find it here - http://code.google.com/p/boilerpipe/downloads/list
you have only install python boilerpipe wrapper.
The following worked best for me:
git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install
You may have to:
- install JPype (sudo apt-get install python-jpype on Ubuntu)
- install charade (sudo pip install charade)
But you won't have to install the boilerpipe JAVA jar's since setup loads this for you.
I tried installing the python boilerpipe from pip, but had no luck. I was successfully running boilerplate java code, but kept getting this same error.
The class HTMLHighlighter
wasn't found. Did you set your JAVA_HOME
? The documentation states:
Be sure to have set JAVA_HOME properly since jpype depends on this setting.
I had the same issue. I saw the set-up details provided by the author of Mining the web. Here is the link to his Github page for boilerpipe
https://github.com/misja/python-boilerpipe/blob/master/setup.py
来源:https://stackoverflow.com/questions/9352259/trouble-importing-boilerpipe-in-python