Fetch Page source using HtmlUnit : URL got stuck

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-19 04:14:52

问题


I am trying to get page source of following URL using Html-Unit get method.

http://denydesigns.com/collections/barbara-sherman-fleece-throw-blanket/products/barbara-sherman-antique-fleece-throw-blanket

It is getting stuck somewhere. I am trying to find out the reason but I am not getting it. I also tried to see if the Thread created by HtmlUnit is BLOCKED ar WAITING, but this is also not the case.

Following is my log generated by HTML Unit.

18 Jan 2013 04:14:47,832 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js] line=[16] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:47,924 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(script1358500487923) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:49,498 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://code.jquery.com/jquery-latest.js] line=[911] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:49,565 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:53,047 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
18 Jan 2013 04:14:53,048 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,060 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
18 Jan 2013 04:14:53,061 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,061 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
18 Jan 2013 04:14:53,062 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,829 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://chat.livechatinc.net/licence/1051689/script.cgi?lang=en&groups=0] line=[60] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:54,878 -  main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://platform.twitter.com/widgets.js] line=[5] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:56,215 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500496196) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:56,458 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_execCommand(HTMLDocument.java:1590) - Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
18 Jan 2013 04:14:58,086 -  main -  WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer

And Following is my Thread Dump for the process created(Using jstack)

2013-01-18 04:17:46
Full thread dump Java HotSpot(TM) 64-Bit Server VM (22.1-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000002955000 nid=0x16dd waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" daemon prio=10 tid=0x00007feca00cc800 nid=0x154f runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007feca00ca000 nid=0x154e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007feca00c7000 nid=0x154d waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007feca00c5000 nid=0x154c runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007feca007c800 nid=0x154b in Object.wait() [0x00007fec9fffe000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x00007feca007a000 nid=0x154a in Object.wait() [0x00007feca4157000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00000000025d9000 nid=0x1546 runnable [0x00007fecaa8b6000]
   java.lang.Thread.State: RUNNABLE
        at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getTopLevelScope(ScriptableObject.java:2007)
        at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:303)
        at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:293)
        at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getPrototype(SimpleScriptable.java:251)
        at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:99)
        at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:110)
        at com.gargoylesoftware.htmlunit.javascript.host.HTMLCollectionFrames.<init>(Window.java:1751)
        at com.gargoylesoftware.htmlunit.javascript.host.Window.getFrames(Window.java:759)
        at com.gargoylesoftware.htmlunit.javascript.host.Window.jsxGet_length(Window.java:749)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:172)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.getValue(ScriptableObject.java:342)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getImpl(ScriptableObject.java:2523)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.get(ScriptableObject.java:438)
        at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.get(SimpleScriptable.java:75)
        at com.gargoylesoftware.htmlunit.javascript.host.Window.get(Window.java:1226)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getProperty(ScriptableObject.java:2088)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1527)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1513)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1398)
        at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:854)
        at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:164)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:429)
        at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:267)
        at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3183)
        at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:538)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:589)
        - locked <0x00000000c274d308> (a com.gargoylesoftware.htmlunit.html.HtmlPage)
        at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:545)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:520)
        at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:896)
        at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:162)
        at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:221)
        at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:735)
        at com.gargoylesoftware.htmlunit.html.HtmlElement$2.run(HtmlElement.java:866)
        at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
        at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
        at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:871)
        at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1162)
        at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:202)
        at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:440)
        at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
        at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:389)

"VM Thread" prio=10 tid=0x00007feca0072800 nid=0x1549 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000025e4000 nid=0x1547 runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000025e5800 nid=0x1548 runnable

"VM Periodic Task Thread" prio=10 tid=0x00007feca00d7800 nid=0x1550 waiting on condition

JNI global references: 317

I am not sure why URL is stuck. It is not coming out of the method. Can any body please look into it.

UPDATE com.gargoylesoftware.htmlunit.html.HTMLParser.HtmlUnitDOMBuilder.parse(XMLInputSource) @Override

    public void parse(final XMLInputSource inputSource) throws XNIException, IOException {
        final HtmlUnitDOMBuilder oldBuilder = page_.getBuilder();
        page_.setBuilder(this);
        try {
            super.parse(inputSource);
        }
        finally {
            page_.setBuilder(oldBuilder);
        }
    }

I attached HtmlUnit source-code from HtmlUnit and Debugged. Above method is not executing completly.

Also, I have set timeout as follows:

webClient.setTimeout(120000);

So why does it not come out of it after 2 min and says SomeThingTimeOutException ?


回答1:


I have followed up with HtmlUnit user group. They have resolved this issue in 2.12 version of HtmlUnit. Please check.



来源:https://stackoverflow.com/questions/14396052/fetch-page-source-using-htmlunit-url-got-stuck

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!