Casting multiple columns from one factor variable

£可爱£侵袭症+ 提交于 2019-12-13 04:59:07

问题


I have picked up an awful public data set that needs a lot of work to make it useful. Here is a simplification:

 Molten<-data.frame(ID=round(runif(100, 0, 50),0), Element=c(rep("Au", 20), rep("Fe", 10),
                                                rep("Al", 30),rep("Cu", 20),rep("Au", 20)),
                 Measure=rnorm(100), Units=c(rep("ppm",10), rep("pct",10), rep("ppb", 80)))

Molten$UnitElement<-paste(Molten$Element, Molten$Units, sep="_")

  Molten<-Molten[!duplicated(Molten[,c("ID", "Element")]),]

I have arrived at a data frame with the IDs and a different column for each element using dcast:

library(reshape2)
Cast<-dcast(Molten, ID~Element, value.var="Measure" )

But there are different units of measure for the same element. So I will need an extra column for each element indicating what unit that record is measured in. For example a column called "GoldUnit" with NA for each entry without a gold measurement and the measured unit for each populated gold record. I'm not sure how to go about this. Any help would be appreciated!

Example of what I would like

  ID, Al, Al_unit, Au, Au_unit, Cu, Cu_unit, Fe, Fe_unit
  5, NA, NA, NA, NA, 1, "ppb", NA, NA
  7, NA, NA, NA, NA, NA , NA, 6, "ppb"
  3, 3, "ppb", 4, "ppm", NA, NA, NA, NA

回答1:


This should return what you're looking for:

library(reshape2)

Element <- c(rep("Au", 20), rep("Fe", 10),rep("Al", 30),rep("Cu", 20),rep("Au", 20))
Measure <- rnorm(100)
ID <- round(runif(100, 0, 50),0)
Units <- c(rep("ppm",10), rep("pct",10), rep("ppb", 80))

Molten <- cbind.data.frame(Element, Measure, ID, Units)
Molten <- Molten[!duplicated(Molten[,c("ID", "Element")]),]

Cast1 <- dcast(Molten, ID~Element, value.var="Measure" )
Cast2 <- dcast(Molten, ID~Element, value.var="Units" )
Cast2$ID <- NULL
names(Cast2) <- paste(names(Cast2), 'unit', sep='_')
Cast <- cbind(Cast1, Cast2)



回答2:


Try

 res <- reshape(Molten[,-5], timevar='Element', idvar='ID', direction='wide')

If you need to change the column names

 indx1 <- grep('Units', colnames(res))
 colnames(res) <- gsub('.*\\.', '',colnames(res))
 colnames(res)[indx1] <- paste(colnames(res)[indx1], 'unit', sep="_")

 head(res,3)
 # ID         Au Au_unit Fe Fe_unit       Al Al_unit        Cu Cu_unit
 #1 26  0.8204623     ppm NA    <NA>       NA    <NA> -1.031156     ppb
 #2 38 -0.1117522     ppm NA    <NA>       NA    <NA>        NA    <NA>
 #3  6 -0.5760871     ppm NA    <NA> 1.701546     ppb  1.492658     ppb


来源:https://stackoverflow.com/questions/27118314/casting-multiple-columns-from-one-factor-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!