R: Converting multiple binary columns into one factor variable whose factors are binary column names

后端 未结 4 1500
挽巷
挽巷 2021-01-03 02:09

I am a new R user. Currently I am working on a dataset wherein I have to transform the multiple binary columns into single factor column

Here is the example:

相关标签:
4条回答
  • 2021-01-03 02:13

    Melt is certainly a solution. I'd suggest using the reshape2 melt as follows:

    library(reshape2)
    
    df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
                  Property.Insurance=c(0,1,0,1,0,0),
                  Property.CarOther=c(0,0,0,0,1,0),
                  Property.Unknown=c(0,0,0,0,0,1))
    
    #add id column (presumably you have ids more meaningful than row numbers)
    df$row=1:nrow(df)
    
    #melt to "long" format
    long=melt(df,id="row")
    
    #only keep 1's
    long=long[which(long$value==1),]
    
    #merge in ids for NA entries
    long=merge(df[,"row",drop=F],long,all.x=T)
    
    #clean up to match example output
    long=long[order(long$row),"variable",drop=F]
    names(long)="Property"
    long$Property=gsub("Property.","",long$Property,fixed=T)
    
    #results
    long
    
    0 讨论(0)
  • 2021-01-03 02:18

    Alternately, you can just do it in the naïve way. I think it's more transparent than any of the other suggestions (including my other suggestion).

    df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
                  Property.Insurance=c(0,1,0,1,0,0),
                  Property.CarOther=c(0,0,0,0,1,0),
                  Property.Unknown=c(0,0,0,0,0,1))
    
    propcols=c("Property.RealEstate", "Property.Insurance", "Property.CarOther", "Property.Unknown")
    
    df$Property=NA
    
    for(colname in propcols)({
      coldata=df[,colname]
      df$Property[which(coldata==1)]=colname
    })
    
    df$Property=gsub("Property.","",df$Property,fixed=T)
    
    0 讨论(0)
  • 2021-01-03 02:36
    > mat <- matrix(c(0,1,0,0,0,
    +                 1,0,0,0,0,
    +                 0,0,0,1,0,
    +                 0,0,1,0,0,
    +                 0,0,0,0,1), ncol = 5, byrow = TRUE)
    > colnames(mat) <- c("Level1","Level2","Level3","Level4","Level5")
    > mat
         Level1 Level2 Level3 Level4 Level5
    [1,]      0      1      0      0      0
    [2,]      1      0      0      0      0
    [3,]      0      0      0      1      0
    [4,]      0      0      1      0      0
    [5,]      0      0      0      0      1
    

    Create a new factor based upon the index of each 1 in each row Use the matrix column names as the labels for each level

    NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)), 
                        labels = colnames(mat)) 
    
    > NewFactor 
    [1] Level2 Level1 Level4 Level3 Level5 
    Levels: Level1 Level2 Level3 Level4 Level5 
    

    also you can try:

    factor(mat%*%(1:ncol(mat)), labels = colnames(mat)) 
    

    also use Tomas solution - ifounf somewhere in SO

    as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
    
    0 讨论(0)
  • 2021-01-03 02:38

    Something different:

    Get the data:

    dat <- data.frame(Property.RealEstate=c(1,0,1,0,0,0),Property.Insurance=c(0,1,0,1,0,0),Property.CarOther=c(0,0,0,0,1,0),Property.Unknown=c(0,0,0,0,0,1))
    

    Reshape it:

    names(dat)[row(t(dat))[t(dat)==1]]
    #[1] "Property.RealEstate" "Property.Insurance"  "Property.RealEstate"
    #[4] "Property.Insurance"  "Property.CarOther"   "Property.Unknown" 
    

    If you want it cleaned up, do:

    gsub("Property\\.","",names(dat)[row(t(dat))[t(dat)==1]])
    #[1] "RealEstate" "Insurance"  "RealEstate" "Insurance"  "CarOther"   "Unknown" 
    

    If you prefer a factor output:

    factor(row(t(dat))[t(dat)==1],labels=names(dat))
    

    ...and cleaned up:

    factor(row(t(dat))[t(dat)==1],labels=gsub("Property\\.","",names(dat)) )
    
    0 讨论(0)
提交回复
热议问题