问题
I tried to run HtmlUnit with Jython following this tutorial:
http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/
but it does not work for me. I am unable to import the com.gargoylesoftvare packages, there are only some HTML files in HtmlUnit folder, which I need to import somehow?
The tutorial says to run python script like this:
/opt/jython/jython -J-classpath "htmlunit-2.8/lib/*" gartner.py
and I try to run:
java -jar /Users/adam/jython/jython.jar -J-classpath "htmlunit-2.8/lib/*" gartner.py
My problem is I am getting an "Unknown option: J-classpath". But there is not even word about -J-classpath parameter on Jython.org. I would be VERY glad for any advice. I am running jython standalone v. 2.5.2 on Snow Leopard
回答1:
Your entire command line is being processed by the java
command (as it should), and -J-classpath is indeed not a valid command line option for java
. You should really try to follow the exact steps of the tutorial, because you are missing several important steps (and kind of making up your own steps).
回答2:
It is possible to run a Jython script as: jython myscript.py if the script appends the full url to the python path using sys.path.append of the jars that a script will require to run.
Here is a current script I'm working on.
#!/opt/jython/jython
'''
Created on Dec 7, 2011
@author: chris
'''
import sys, os
from time import sleep
jarpath = '/usr/share/java/htmlunit/' #path the jar files to import
jars = ['apache-mime4j-0.6.jar','commons-codec-1.4.jar',
'commons-collections-3.2.1.jar','commons-io-1.4.jar',
'commons-lang-2.4.jar','commons-logging-1.1.1.jar',
'cssparser-0.9.5.jar','htmlunit-2.8.jar',
'htmlunit-core-js-2.8.jar','httpclient-4.0.1.jar',
'httpcore-4.0.1.jar','httpmime-4.0.1.jar',
'nekohtml-1.9.14.jar','sac-1.3.jar',
'serializer-2.7.1.jar','xalan-2.7.1.jar',
'xercesImpl-2.9.1.jar','xml-apis-1.3.04.jar'] #a list of jars
def loadjars(): #appends jars to jython path
for jar in jars:
print(jarpath+jar+'\n')
container = jarpath+jar
sys.path.append(container)
loadjars()
import com.gargoylesoftware.htmlunit.WebClient as WebClient
webclient = WebClient()
def gotopage():
print('hello, I will visit Google')
url = 'http://google.com'
page = webclient.getPage(url)
print(page)
if __name__ == "__main__":
gotopage()
回答3:
I have met such error before, and do these steps i solve it successfully.
- download jython and run
java -jar python-installer-xxx.jar
to install jython, then you can putjython/bin
folder to your system path, runjython
in command line to ensure it's ok. - download htmlunit jar files in sourceforge and you need to specific its location.
write your .py file and run
jython -J-classpath "/Users/crabime/Development Folder/htmlunit-2.23/lib/*" /Users/crabime/PycharmProjects/scrapimage/crabime/gartner.py
everything will ok,if you still miss module not found, maybe you should check your input command type error.
来源:https://stackoverflow.com/questions/7758469/running-htmlunit-with-jython-issue-with-startup-on-command-line