How can I use a loop to scrape website data for multiple webpages in R?

前端 未结 3 1790
無奈伤痛
無奈伤痛 2021-01-19 04:50

I would like to apply a loop to scrape data from multiple webpages in R. I am able to scrape the data for one webpage, however when I attempt to use a loop for multiple page

相关标签:
3条回答
  • 2021-01-19 05:10

    Just initalize empty dataframe before loop. I have done this problem and following code works fine for me.

    country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
    df <- data.frame(names = character(0),facts = character(0),nm = character(0))
    
    for(i in country){
    
      site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
      site <- html(site)
    
      stats<-
        data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
                   facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
                   stringsAsFactors=FALSE)
    
      stats$nm <- i
      stats$names   <- gsub('[\r\n\t]', '', stats$names)
      stats$facts   <- gsub('[\r\n\t]', '', stats$facts)
      #stats<-stats[!duplicated(stats),]
      #all<-rbind(all,stats)
      df <- rbind(df, stats)
      #all <- merge(Output,stats)
    
    }
    View(df)
    
    0 讨论(0)
  • 2021-01-19 05:17

    Final working code:

    ###########################
    # THIS WORKS!!!!
    ###########################
    
    country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
    
    for(i in country){
    
    site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
    site <- html(site)
    
    stats<-
    data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
         facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
           stringsAsFactors=FALSE)
    
    stats$nm <- i
    stats$names   <- gsub('[\r\n\t]', '', stats$names)
    stats$facts   <- gsub('[\r\n\t]', '', stats$facts)
    #stats<-stats[!duplicated(stats),]
    all<-rbind(all,stats)
    
    }
     View(all)
    
    0 讨论(0)
  • 2021-01-19 05:20

    This is what I did. It is not the best solution, but you will get an output. Also this is only a workaround. I do not recommend you write a table output into a file while running a loop. Here you go. After the output is generated from stats,

    output<-rbind(stats,i)
    

    and then write the table to,

    write.table(output, file = "D:\\Documents\\HTML\\Test of loop.csv", row.names = FALSE, append = TRUE, sep = ",")
    
    #then close the loop
    }
    

    Good luck

    0 讨论(0)
提交回复
热议问题