Ordinarily, I\'d use a pretty basic long-to-wide reshape for this, but it seems to be dropping my aggregation variables. The setup is I had a job on mechanical Turk that I perfo
Using plyr
:
res = ddply(dat,.(Input.id,Input.State),
function(x)unlist(as.character(x$Answer.Q1thing)))
setNames(res,c('Id','State','Answer1','Answer2','Answer3'))
Id State Answer1 Answer2 Answer3
1 134231 NY Myguess Myguess BadGuess
2 134812 CA Another Another Another
EDIT
In case you have fewer than 3 answers:
res = ddply(dat,.(Input.id,Input.State),
function(x)
{
xx= unlist(as.character(x$Answer.Q1thing))
if(length(xx)==3)xx
else c(xx,rep(NA,3-length(xx)))
})
If your data is in a data.table
it's a one-liner can be done as follows:
library(data.table)
mturk.dt <- as.data.table(mturk)
mturk.dt[, as.list(
rbind(c(Answer.Q1thing, AssignmentStatus))
)
, by=list(Id=Input.id, State=Input.State)]
Note that the by
argument handles the name-changing too!
If you want to properly name the other columns, use setnames
after the fact or, more dynamically, using setattr
within the j=..
argument as follows:
## Assuming 'res' is the reshaped data.table form above:
## Change the names of the six V1, V2.. columns
setnames(res, paste0("V", 1:6), c(paste0("Answer", 1:3), paste0("Status", 1:3)))
j=..
## Use `as.data.table` instead of `as.list`, to preserve new names
mturk.dt[, as.data.table(
rbind(c(
setattr(Answer.Q1thing, "names", paste0("Answer", seq(Answer.Q1thing )))
, setattr(AssignmentStatus, "names", paste0("Status", seq(AssignmentStatus)))
))
)
, by=list(Id=Input.id, State=Input.State)]
Id State Answer1 Answer2 Answer3 Status1 Status2 Status3
1: 134231 NY Myguess Myguess BadGuess Approved Approved Approved
2: 134812 CA Myguess Myguess BadGuess Approved Approved Approved
That was super helpful, @Ricardo and @agstudy. I realized my reshape wasn't working only because it demanded a unique, categorical "timevar". I think in most cases, you do have a categorical label/factor like this that makes it easy, but it is not much harder to count them, and make the count into a label.
I had a second problem was that my number of answers were not consistent; you both gave good help for that, but I was also able to just generate a counter and then implemented my original long-to-wide.
Where the count was < 3, that is where an Input.id only had 2 Answers, I got NAs for this, which is what I wanted.
So altogether:
mturk$idx <- with(mturk, ave(Input.id, Input.id, FUN=seq_along)) # weird!
dat <- reshape(mturk, timevar="idx", idvar=c("Input.id", "Input.state"), direction="wide")
I used the syntax for counting sequences within a group that I found here. This was a little idiosyncratic in the use of the ave()
function, but seems to crop up in a couple other answers. Tried rtl
, too, but had no luck. Using ave(x,x,seq_along) seems to mostly be a hack to avoid sorting. It's odd to use this work-around for sequences in groups because clearly both count() and rtl() are effectively creating this sequencing under the hood in a temp variable.
I like the way data.table allows this sequencing better.
From data.table v1.9.5+, dcast
can handle multiple value.var
columns, i.e., we can cast multiple columns simultaneously. We can simply do:
dt[, id := seq_len(.N), by=Input.id]
dcast(dt, Input.id + Input.State ~ id,
value.var=c("AssignmentStatus", "Answer.Q1thing"))
# Input.id Input.State 1_AssignmentStatus 2_AssignmentStatus 3_AssignmentStatus
# 1: 134231 NY Approved Approved Approved
# 2: 134812 CA Submitted Approved Approved
# 1_Answer.Q1thing 2_Answer.Q1thing 3_Answer.Q1thing
# 1: Myguess Myguess BadGuess
# 2: Another Another Another
Or everything together in one line:
dcast(dt, Input.id + Input.State ~ dt[, seq_len(.N), by=Input.id]$V1,
value.var=c("AssignmentStatus", "Answer.Q1thing"))