问题
I have a data frame that follows the below long Pattern:
Name MedName
Name1 atenolol 25mg
Name1 aspirin 81mg
Name1 sildenafil 100mg
Name2 atenolol 50mg
Name2 enalapril 20mg
And would like to get below (I do not care if I can get the columns to be named this way, just want the data in this format):
Name medication1 medication2 medication3
Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
Name2 atenolol 50mg enalapril 20mg NA
Through this very site I have become familiarish with the reshape/reshape2 package, and have went through several attempts to try to get this to work but have thus far failed.
When I try dcast(dataframe, Name ~ MedName, value.var='MedName')
I just get a bunch of columns that are flags of the medication names (values that get transposed are 1 or 0) example:
Name atenolol 25mg aspirin 81mg
Name1 1 1
Name2 0 0
I also tried a dcast(dataset, Name ~ variable)
after I melted the dataset, however this just spits out the following (just counts how many meds each person has):
Name MedName
Name1 3
name2 2
Finally, I tried to melt the data and then reshape using idvar="Name"
timevar="variable"
(of which all just are Mednames), however this does not seem built for my issue since if there are multiple matches to the idvar, the reshape just takes the first MedName and ignores the rest.
Does anyone know how to do this using reshape or another R function? I realize that there probably is a way to do this in a more messy manner with some for loops and conditionals to basically split and re-paste the data, but I was hoping there was a more simple solution. Thank you so much!
回答1:
Assuming your data is in the object dataset
:
library(plyr)
## Add a medication index
data_with_index <- ddply(dataset, .(Name), mutate,
index = paste0('medication', 1:length(Name)))
dcast(data_with_index, Name ~ index, value.var = 'MedName')
## Name medication1 medication2 medication3
## 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
## 2 Name2 atenolol 50mg enalapril 20mg <NA>
回答2:
With the data.table package, this could easily be solved with the new rowid
function:
library(data.table)
dcast(setDT(d1),
Name ~ rowid(Name, prefix = "medication"),
value.var = "MedName")
which gives:
Name medication1 medication2 medication3 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg 2 Name2 atenolol 50mg enalapril 20mg <NA>
Another method (commonly used before version 1.9.7):
dcast(setDT(d1)[, rn := 1:.N, by = Name],
Name ~ paste0("medication",rn),
value.var = "MedName")
giving the same result.
A similar approach, but now using the dplyr and tidyr packages:
library(dplyr)
library(tidyr)
d1 %>%
group_by(Name) %>%
mutate(rn = paste0("medication",row_number())) %>%
spread(rn, MedName)
which gives:
Source: local data frame [2 x 4] Groups: Name [2] Name medication1 medication2 medication3 (fctr) (chr) (chr) (chr) 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg 2 Name2 atenolol 50mg enalapril 20mg NA
回答3:
You could always generate a unique timevar
before using reshape
. Here I use ave
to apply the function seq_along
'along' each "Name".
test <- data.frame(
Name=c(rep("name1",3),rep("name2",2)),
MedName=c("atenolol 25mg","aspirin 81mg","sildenafil 100mg",
"atenolol 50mg","enalapril 20mg")
)
# generate the 'timevar'
test$uniqid <- with(test, ave(as.character(Name), Name, FUN = seq_along))
# reshape!
reshape(test, idvar = "Name", timevar = "uniqid", direction = "wide")
Result:
Name MedName.1 MedName.2 MedName.3
1 name1 atenolol 25mg aspirin 81mg sildenafil 100mg
4 name2 atenolol 50mg enalapril 20mg <NA>
回答4:
This seems to actually be a fairly common problem, so I have included a function called getanID
in my "splitstackshape" package.
Here's what it does:
library(splitstackshape)
getanID(test, "Name")
# Name MedName .id
# 1: name1 atenolol 25mg 1
# 2: name1 aspirin 81mg 2
# 3: name1 sildenafil 100mg 3
# 4: name2 atenolol 50mg 1
# 5: name2 enalapril 20mg 2
Since "data.table" is loaded along with "splitstackshape", you have access to dcast.data.table
, so you can proceed as with @mnel's example.
dcast.data.table(getanID(test, "Name"), Name ~ .id, value.var = "MedName")
# Name 1 2 3
# 1: name1 atenolol 25mg aspirin 81mg sildenafil 100mg
# 2: name2 atenolol 50mg enalapril 20mg NA
The function essentially implements a sequence(.N)
by the groups identified to create the "time" column.
回答5:
@thelatemail's solution is similar to this one. When I generate the time variable, I use rle
in case I'm not working interactively and the Name
variable needs to be dynamic.
# start with your example data
x <-
data.frame(
Name=c(rep("name1",3),rep("name2",2)),
MedName=c("atenolol 25mg","aspirin 81mg","sildenafil 100mg",
"atenolol 50mg","enalapril 20mg")
)
# pick the id variable
id <- 'Name'
# sort the data.frame by that variable
x <- x[ order( x[ , id ] ) , ]
# construct a `time` variable on the fly
x$time <- unlist( lapply( rle( as.character( x[ , id ] ) )$lengths , seq_len ) )
# `reshape` uses that new `time` column by default
y <- reshape( x , idvar = id , direction = 'wide' )
# done
y
回答6:
One clean solution involves the very useful pivot_wider
function from the tidyr
package version 1.1.0
. With this you can also directly specify the column names by using the argument names_glue
.
library(tidyr)
library(dplyr)
dataframe %>%
group_by(Name) %>%
mutate(row_n = row_number()) %>%
pivot_wider(id_cols = Name, names_from = row_n, values_from = MedName, names_glue = "medication{row_n}")
Output
# A tibble: 2 x 4
# Groups: Name [2]
# Name medication1 medication2 medication3
# <chr> <chr> <chr> <chr>
# 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
# 2 Name2 atenolol 50mg enalapril 20mg NA
回答7:
Here's a shorter way, taking advantage of the way unlist
deals with names:
library(dplyr)
df1 %>% group_by(Name) %>% do(as_tibble(t(unlist(.[2]))))
# # A tibble: 2 x 4
# # Groups: Name [2]
# Name MedName1 MedName2 MedName3
# <chr> <chr> <chr> <chr>
# 1 name1 atenolol 25mg aspirin 81mg sildenafil 100mg
# 2 name2 atenolol 50mg enalapril 20mg <NA>
回答8:
A tidyr
solution with chop()
and unnest_wider()
.
library(tidyr)
df2 %>%
chop(-Name) %>%
unnest_wider(MedName, names_sep = "")
# # A tibble: 2 x 4
# Name MedName1 MedName2 MedName3
# <chr> <chr> <chr> <chr>
# 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
# 2 Name2 atenolol 50mg enalapril 20mg NA
The argument names_sep = ""
is necessary; otherwise, the new column names will be ..1
, ..2
, and ..3
.
来源:https://stackoverflow.com/questions/65731796/how-do-i-merge-rows-with-the-same-name