RSelenium: hangs in navigate to direct pdf download

爷,独闯天下 提交于 2019-12-20 05:33:08

问题


Using RSelenium via Docker Toolbox for Windows with selenium/standalone-firefox-debug container - all working fine: docker run -d -v //c/test/://home/seluser/Downloads -p 4445:4444 -p 5901:5900 selenium/standalone-firefox-debug

Have setup firefox profile to download pdf directly:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))

Using the following code, when I navigate directly to the pdf, it downloads to the specified directory fine but then it hangs at that point, not allowing any proceeding code to execute.

library(RSelenium)

remDr <- remoteDriver(remoteServerAddr = "*docker-ip*", port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

I have to manually stop the R code and the error that displays is:

Error in checkError(res) : 
Undefined error in httr call. httr output: Operation was aborted by an application callback

If I VNC into the container and look at what is displayed in the browser, the file has downloaded but there is nothing in the address bar.

screenshot Any ideas? I am assuming that it is something to do with the httr/rselenium packages not receiving some sort of 'loaded' signal from the browser, but this extends beyond my troubleshooting ability. This method had worked previously using the .jar file selenium-standalone-server and RSelenium.

sessionInfo() & remDr$open() output below:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RSelenium_1.7.1

loaded via a namespace (and not attached):
 [1] httr_1.2.1     R6_2.2.0       assertthat_0.1 tools_3.3.2    wdman_0.2.2    binman_0.1.0  
 [7] curl_2.3       Rcpp_0.12.9    jsonlite_1.2   caTools_1.17.1 openssl_0.9.6  bitops_1.0-6  
[13] semver_0.2.0   XML_3.98-1.5  



> remDr$open()
[1] "Connecting to remote server"
$rotatable
[1] FALSE

$raisesAccessibilityExceptions
[1] FALSE

$firefoxOptions
$firefoxOptions$args
list()

$firefoxOptions$profile
[1] "UEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwcII+hBcQsBAAD3AgAAUEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAHAAAAdXNlci5qc51WTW/bMAy971cMOW3AKqTretlOXdcBA4Z1aFDsKMgSbauRJU0fcfPvR/mjSRNHbndKbJMS+fj4yOjBUeugfLconGnxiXhWQvdf6oo0TLXMAQHNCgVi8eFtyZSH91/exJ2nYAFtrHEhudTAVKj7Z4JGG8ln/DWE1rg1qUOwxNbS19uz9Nky788U6CrU6Pjx8vK52xiwAybwR0AAHkB8l86HK4yFK0C34OJhuKbBvB4pr51pgHrupA3URU2DbJLLxXL6osAKTxAOfauvlfEwnc1oLUyrlWEC79KsSsDWpv1Tg14hWgmpaXeLQdng02W0MYKpGexhE4xRnoBzxnGjvVH7cB+n72WljUbUGmgKcKvu0edz8eC9RKtgkAsOfETcSgyUcsd8nfdVUq+JsaApPAZwmqlUzFczqExlvYt6+rIWCuHkBp8Z54DljBoz90gHysEFP4nEU6Wkt4ptQdycL1e/DDInlfbTtDG+Erf6j9RYX3++JBIvMvd3P9FjwQoTw+dCMb1... <truncated>


$appBuildId
[1] "20170125094131"

$version
[1] ""

$platform
[1] "LINUX"

$proxy
named list()

$command_id
[1] 1

$nativeEvents
[1] TRUE

$specificationLevel
[1] 0

$acceptSslCerts
[1] FALSE

$processId
[1] 3012

$webdriver.remote.sessionid
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

$browserVersion
[1] "51.0.1"

$platformVersion
[1] "4.4.47-boot2docker"

$XULappId
[1] "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"

$browserName
[1] "firefox"

$takesScreenshot
[1] TRUE

$javascriptEnabled
[1] TRUE

$takesElementScreenshot
[1] TRUE

$platformName
[1] "linux"

$cssSelectorsEnabled
[1] TRUE

$firefox_profile
[1] "UEsDBBQAAgAIAJRZW0oj6EFxCwEAAPcCAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwECHgAUAAIACACUWVtKI+hBcQsBAAD3AgAACAAAAAAAAAABACAAAAAAAAAAcHJlZnMuanNQSwUGAAAAAAEAAQA2AAAAMQEAAAAA"

$id
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

回答1:


I had the same problems using the most recent version of firefox (51.0.1). This was on a windows machine and the issue seemed to be the pdfjs.disabled flag. The issue was not present in older versions of firefox. The Docker image tagged 2.53.1 runs firefox 47 for example. If possible run an older version using (on a linux box):

docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1

Now running your code we see:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))
library(RSelenium)

remDr <- remoteDriver(port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

> list.files("/home/john/test/")
[1] "eqbPDFChartPlus.cfm"

The pdf would need to be renamed (its being named as a colfusion .cfm file)

As to what is happening with more recent versions of firefox you would need to refer that to most likely the geckodriver project. Users with clients other than RSelenium have also had recent issues Can't download PDF with selenium webdriver + firefox



来源:https://stackoverflow.com/questions/42476693/rselenium-hangs-in-navigate-to-direct-pdf-download

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!