问题
I am trying to get page source of following URL using Html-Unit get method.
http://denydesigns.com/collections/barbara-sherman-fleece-throw-blanket/products/barbara-sherman-antique-fleece-throw-blanket
It is getting stuck somewhere. I am trying to find out the reason but I am not getting it. I also tried to see if the Thread created by HtmlUnit is BLOCKED ar WAITING, but this is also not the case.
Following is my log generated by HTML Unit.
18 Jan 2013 04:14:47,832 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js] line=[16] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:47,924 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(script1358500487923) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:49,498 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://code.jquery.com/jquery-latest.js] line=[911] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:49,565 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:53,047 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
18 Jan 2013 04:14:53,048 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,060 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
18 Jan 2013 04:14:53,061 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,061 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
18 Jan 2013 04:14:53,062 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,829 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://chat.livechatinc.net/licence/1051689/script.cgi?lang=en&groups=0] line=[60] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:54,878 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://platform.twitter.com/widgets.js] line=[5] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:56,215 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500496196) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:56,458 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_execCommand(HTMLDocument.java:1590) - Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
18 Jan 2013 04:14:58,086 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer
And Following is my Thread Dump for the process created(Using jstack)
2013-01-18 04:17:46
Full thread dump Java HotSpot(TM) 64-Bit Server VM (22.1-b02 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000002955000 nid=0x16dd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" daemon prio=10 tid=0x00007feca00cc800 nid=0x154f runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007feca00ca000 nid=0x154e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007feca00c7000 nid=0x154d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007feca00c5000 nid=0x154c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007feca007c800 nid=0x154b in Object.wait() [0x00007fec9fffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
"Reference Handler" daemon prio=10 tid=0x00007feca007a000 nid=0x154a in Object.wait() [0x00007feca4157000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00000000025d9000 nid=0x1546 runnable [0x00007fecaa8b6000]
java.lang.Thread.State: RUNNABLE
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getTopLevelScope(ScriptableObject.java:2007)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:303)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:293)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getPrototype(SimpleScriptable.java:251)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:99)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:110)
at com.gargoylesoftware.htmlunit.javascript.host.HTMLCollectionFrames.<init>(Window.java:1751)
at com.gargoylesoftware.htmlunit.javascript.host.Window.getFrames(Window.java:759)
at com.gargoylesoftware.htmlunit.javascript.host.Window.jsxGet_length(Window.java:749)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:172)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.getValue(ScriptableObject.java:342)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getImpl(ScriptableObject.java:2523)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.get(ScriptableObject.java:438)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.get(SimpleScriptable.java:75)
at com.gargoylesoftware.htmlunit.javascript.host.Window.get(Window.java:1226)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getProperty(ScriptableObject.java:2088)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1527)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1513)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1398)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:854)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:164)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:429)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:267)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3183)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:538)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:589)
- locked <0x00000000c274d308> (a com.gargoylesoftware.htmlunit.html.HtmlPage)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:545)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:520)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:896)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:162)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:221)
at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:735)
at com.gargoylesoftware.htmlunit.html.HtmlElement$2.run(HtmlElement.java:866)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:871)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1162)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:202)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:440)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:389)
"VM Thread" prio=10 tid=0x00007feca0072800 nid=0x1549 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000025e4000 nid=0x1547 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000025e5800 nid=0x1548 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007feca00d7800 nid=0x1550 waiting on condition
JNI global references: 317
I am not sure why URL is stuck. It is not coming out of the method. Can any body please look into it.
UPDATE com.gargoylesoftware.htmlunit.html.HTMLParser.HtmlUnitDOMBuilder.parse(XMLInputSource) @Override
public void parse(final XMLInputSource inputSource) throws XNIException, IOException {
final HtmlUnitDOMBuilder oldBuilder = page_.getBuilder();
page_.setBuilder(this);
try {
super.parse(inputSource);
}
finally {
page_.setBuilder(oldBuilder);
}
}
I attached HtmlUnit source-code from HtmlUnit and Debugged. Above method is not executing completly.
Also, I have set timeout as follows:
webClient.setTimeout(120000);
So why does it not come out of it after 2 min and says SomeThingTimeOutException ?
回答1:
I have followed up with HtmlUnit user group. They have resolved this issue in 2.12 version of HtmlUnit. Please check.
来源:https://stackoverflow.com/questions/14396052/fetch-page-source-using-htmlunit-url-got-stuck