I would like to apply a loop to scrape data from multiple webpages in R. I am able to scrape the data for one webpage, however when I attempt to use a loop for multiple page
Just initalize empty dataframe before loop. I have done this problem and following code works fine for me.
country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
df <- data.frame(names = character(0),facts = character(0),nm = character(0))
for(i in country){
site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
site <- html(site)
stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)
stats$nm <- i
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
#stats<-stats[!duplicated(stats),]
#all<-rbind(all,stats)
df <- rbind(df, stats)
#all <- merge(Output,stats)
}
View(df)
Final working code:
###########################
# THIS WORKS!!!!
###########################
country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
for(i in country){
site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
site <- html(site)
stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)
stats$nm <- i
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
#stats<-stats[!duplicated(stats),]
all<-rbind(all,stats)
}
View(all)
This is what I did. It is not the best solution, but you will get an output. Also this is only a workaround. I do not recommend you write a table output into a file while running a loop. Here you go. After the output is generated from stats
,
output<-rbind(stats,i)
and then write the table to,
write.table(output, file = "D:\\Documents\\HTML\\Test of loop.csv", row.names = FALSE, append = TRUE, sep = ",")
#then close the loop
}
Good luck