readHTMLTable and UTF-8 encoding

后端 未结 1 401
花落未央
花落未央 2021-02-06 08:42

I have encoding problem with readHTMLTable and XML package generally. I would like to download some tables from polish site allegro.pl (auction site similar to ebay), but after

1条回答
  •  余生分开走
    2021-02-06 09:20

    for some time I was mailing with Duncan Temple Lang, the creator of XML package. Yesterday (30.01.2012) he uploaded new version of XML package on Omegahat website. New version 3.9-4 for 31bit version of R remove this encoding problem! :)

    download package form link below: http://www.omegahat.org/R/bin/windows/contrib/2.14/

    library(XML)
    url<-paste("http://allegro.pl/listing.php/search?category=15821&sg=0&p=",1:5,"&string=facebook",sep="")
    doc = htmlParse(url[1], encoding = "UTF-8")
    z = as.data.frame(readHTMLTable(doc, stringsAsFactors = FALSE)$lista)
    

    It works, so we can close this topic. :)

    0 讨论(0)
提交回复
热议问题