问题
I have the following dataframe:
st <- data.frame(
se = rep(1:2, 5),
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2))
st$xy <- paste(st$X,",",st$Y)
st <- st[c("se","xy")]
but I want it to be the following:
1 2 3 4 5
-1.53697673029089 , 2.10652020463275 -1.02183940974772 , 0.623009466458354 1.33614674072657 , 1.5694345481646 0.270466789820086 , -0.75670874554064 -0.280167896821629 , -1.33313822867893
0.26012874418111 , 2.87972571647846 -1.32317949800031 , -2.92675188421021 0.584199000313255 , 0.565499464846637 -0.555881716346136 , -1.14460518414649 -1.0871665543915 , -3.18687136890236
I mean when the value of se
is the same, make a column bind.
Do you have any ideas how to accomplish this?
I had no luck with spread(tidyr)
, and I guess it's something which involves sapply
, cbind
and a if
statement. Because the real data involves more than 35.000 rows.
回答1:
It seems as though your eventual goal is to have a data file which has roughly 35000 columns. Are you sure about that? That doesn't sound very tidy.
To do what you want, you are going to need to have a row identifier. In the below, I've called it caseid, and then removed it once it was no longer required. I then transpose the result to get what you asked for.
library(tidyr)
library(dplyr)
st <- data.frame(
se = rep(1:2, 5),
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2))
st$xy <- paste(st$X,",",st$Y)
st <- st[c("se","xy")]
st$caseid = rep(1:(nrow(st)/2), each = 2) # temporary
df = spread(st, se, xy) %>%select(-caseid) %>%t()
print(df)
回答2:
If we need to split the 'xy' column elements into individual units, cSplit
from splitstackshape
can be used. Then rbind
the alternating rows of 'st1' after unlist
ing`.
library(splitstackshape)
st1 <- cSplit(st, 'xy', ', ', 'wide')
rbind(unlist(st1[c(TRUE,FALSE)][,-1, with=FALSE]),
unlist(st1[c(FALSE, TRUE)][,-1, with=FALSE]))
If we don't need to split
the 'xy' column into individual elements, we can use dcast
from data.table
. It should be fast enough. Convert the 'data.frame' to 'data.table' (setDT(st)
, create a sequence column ('N') by 'se', and then dcast
from 'long' to 'wide'.
library(data.table)
dcast(setDT(st)[, N:= 1:.N, se], se~N, value.var= 'xy')
来源:https://stackoverflow.com/questions/35207472/rearrange-dataframe-by-subsetting-and-column-bind