Trouble importing boilerpipe in python

随声附和 提交于 2019-12-22 10:58:19

问题


I'm building an application using python which involves getting news articles from RSS feeds. As part of my project, I have decided to use boilerpipe in order to extract just the article content from the html page on which the article appears.

Although boilerpipe was originally written for java, it has been ported to python too. You can see its page on github here: https://github.com/misja/python-boilerpipe

The problem is that I get an exception when trying to import it using:

from boilerpipe.extract import Extractor

The error I get is:

Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\boilerpipe\extract__init__.py", line 12, in
File "C:\Python26\lib\site-packages\jpype_jclass.py", line 54, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
jpype._jexception.ExceptionPyRaisable: java.lang.Exception: Class 
de.l3s.boilerpipe.sax.HTMLHighlighter not found

What might be causing this problem and how can I fix it?


回答1:


This worked for me on Mac OS X 10.8.5 with Python 2.7.9.:

pip install JPype1    # to install https://pypi.python.org/pypi/JPype1
pip install charade
git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install

Then you should be able to do in the python console

>>> from boilerpipe.extract import Extractor
>>> extractor = Extractor(extractor='ArticleExtractor', url="http://en.wikipedia.org/wiki/Main_Page")
>>> print extractor.getText()



回答2:


You are missing boiler pipe java packages install, you can find it here - http://code.google.com/p/boilerpipe/downloads/list

you have only install python boilerpipe wrapper.




回答3:


The following worked best for me:

git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install

You may have to:

  • install JPype (sudo apt-get install python-jpype on Ubuntu)
  • install charade (sudo pip install charade)

But you won't have to install the boilerpipe JAVA jar's since setup loads this for you.

I tried installing the python boilerpipe from pip, but had no luck. I was successfully running boilerplate java code, but kept getting this same error.




回答4:


The class HTMLHighlighter wasn't found. Did you set your JAVA_HOME? The documentation states:

Be sure to have set JAVA_HOME properly since jpype depends on this setting.




回答5:


I had the same issue. I saw the set-up details provided by the author of Mining the web. Here is the link to his Github page for boilerpipe

https://github.com/misja/python-boilerpipe/blob/master/setup.py



来源:https://stackoverflow.com/questions/9352259/trouble-importing-boilerpipe-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!