Loop to scrape data from Wikipedia in R

后端未结

关注

 2  1578

死守一世寂寞 2021-01-21 04:43

I am trying to extract data about celebrity/notable deaths for analysis. Wikipedia has a very regular structure to their html paths concerning notable dates of death. It looks l

2条回答

爱一瞬间的悲伤 (楼主)

2021-01-21 05:15
html_text(fnames) returns an array. Your problem is trying append an array onto a dataframe.
Try converting your variable text to a dataframe before appending:
```
for (y in 2015:2015){
  for (m in 1:12){
    site = read_html(paste("https://en.wikipedia.org/wiki/Deaths_in_",mlist[m],
           "_",y,collapse=""))
    fnames = html_nodes(site,"#mw-content-text h3+ ul li")
    text = html_text(fnames)

    temp<-data.frame(text, stringsAsFactors = FALSE)

    data = rbind(data,temp)
    }
 } 
```
This is not the best technique for the performance reasons. Each time through the loop, the memory for the dataframe is reallocated which slows performance, with this being a one time event and a limit number of requests it should be manageable in this case.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...