问题
I'm trying to process some data in JSON format. rjson::fromJSON
imports the data successfully and places it into a quite unwieldy list.
library(rjson)
y <- fromJSON(file="http://api.lmiforall.org.uk/api/v1/wf/predict/breakdown/region?soc=6145&minYear=2014&maxYear=2020")
str(y)
List of 3
$ soc : num 6145
$ breakdown : chr "region"
$ predictedEmployment:List of 7
..$ :List of 2
.. ..$ year : num 2014
.. ..$ breakdown:List of 12
.. .. ..$ :List of 3
.. .. .. ..$ code : num 1
.. .. .. ..$ name : chr "London"
.. .. .. ..$ employment: num 74910
.. .. ..$ :List of 3
.. .. .. ..$ code : num 7
.. .. .. ..$ name : chr "Yorkshire and the Humber"
.. .. .. ..$ employment: num 61132
...
However, as this is essentially tabular data, I would like it in a succinct data.frame
. After much trial and error I have the result:
y.p <- do.call(rbind,lapply(y[[3]], function(p) cbind(p$year,do.call(rbind,lapply(p$breakdown, function(q) data.frame(q$name,q$employment,stringsAsFactors=F))))))
head(y.p)
p$year q.name q.employment
1 2014 London 74909.59
2 2014 Yorkshire and the Humber 61131.62
3 2014 South West (England) 65833.57
4 2014 Wales 33002.64
5 2014 West Midlands (England) 68695.34
6 2014 South East (England) 98407.36
But the command seems overly fiddly and complex. Is there a simpler way of doing this?
回答1:
I am not sure it is simpler, but the result is more complete and I think is easier to read. My idea using Map
is, for each couple (year,breakdown), aggregate breakdown data into single table and then combine it with year.
dat <- y[[3]]
res <- Map(function(x,y)data.frame(year=y,
do.call(rbind,lapply(x,as.data.frame))),
lapply(dat,'[[','breakdown'),
lapply(dat,'[[','year'))
## transform the list to a big data.frame
do.call(rbind,res)
year code name employment
1 2014 1 London 74909.59
2 2014 7 Yorkshire and the Humber 61131.62
3 2014 4 South West (England) 65833.57
4 2014 10 Wales 33002.64
5 2014 5 West Midlands (England) 68695.34
6 2014 2 South East (England) 98407.36
回答2:
Here I recover the geometry of the list
ni <- seq_along(y[[3]])
nj <- seq_along(y[[c(3, 1, 2)]])
nij <- as.matrix(expand.grid(3, ni=ni, 2, nj=nj))
then extract the relevant variable information using the rows of nij
as an index into the nested list
data <- apply(nij, 1, function(ij) y[[ij]])
year <- apply(cbind(nij[,1:2], 1), 1, function(ij) y[[ij]])
and make it into a more friendly structure
> data.frame(year, do.call(rbind, data))
year code name employment
1 2014 1 London 74909.59
2 2015 5 West Midlands (England) 69132.34
3 2016 12 Northern Ireland 24313.94
4 2017 5 West Midlands (England) 71723.4
5 2018 9 North East (England) 27199.99
6 2019 4 South West (England) 71219.51
来源:https://stackoverflow.com/questions/17674623/processing-json-using-rjson