Generate a dummy-variable

前端 未结 17 978
遇见更好的自我
遇见更好的自我 2020-11-21 11:41

I have trouble generating the following dummy-variables in R:

I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:

  1. <
相关标签:
17条回答
  • 2020-11-21 12:28

    Using dummies::dummy():

    library(dummies)
    
    # example data
    df1 <- data.frame(id = 1:4, year = 1991:1994)
    
    df1 <- cbind(df1, dummy(df1$year, sep = "_"))
    
    df1
    #   id year df1_1991 df1_1992 df1_1993 df1_1994
    # 1  1 1991        1        0        0        0
    # 2  2 1992        0        1        0        0
    # 3  3 1993        0        0        1        0
    # 4  4 1994        0        0        0        1
    
    0 讨论(0)
  • 2020-11-21 12:28

    Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

    If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

    introducedummy<-function(x,a,b,c){
       g<-c(a,b,c)
      n<-nrow(x)
      newcol<-g[1]
      p<-colnames(x)
      p2<-c(p,newcol)
      new1<-numeric(n)
      state<-x[,g[2]]
      interest<-g[3]
      for(i in 1:n){
        if(state[i]==interest){
          new1[i]=1
        }
        else{
          new1[i]=0
        }
      }
        x$added<-new1
        colnames(x)<-p2
        x
      }
    
    0 讨论(0)
  • 2020-11-21 12:29

    Package mlr includes createDummyFeatures for this purpose:

    library(mlr)
    df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
    df
    
    #    var
    # 1    B
    # 2    A
    # 3    C
    # 4    B
    # 5    C
    # 6    A
    # 7    C
    # 8    A
    # 9    B
    # 10   C
    
    createDummyFeatures(df, cols = "var")
    
    #    var.A var.B var.C
    # 1      0     1     0
    # 2      1     0     0
    # 3      0     0     1
    # 4      0     1     0
    # 5      0     0     1
    # 6      1     0     0
    # 7      0     0     1
    # 8      1     0     0
    # 9      0     1     0
    # 10     0     0     1
    

    createDummyFeatures drops original variable.

    https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures
    .....

    0 讨论(0)
  • 2020-11-21 12:29

    Another way is to use mtabulate from qdapTools package, i.e.

    df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
      var
    #1   C
    #2   A
    #3   C
    #4   B
    #5   B
    
    library(qdapTools)
    mtabulate(df$var)
    

    which gives,

      A B C
    1 0 0 1
    2 1 0 0
    3 0 0 1
    4 0 1 0
    5 0 1 0
    
    0 讨论(0)
  • 2020-11-21 12:29

    We can also use cSplit_e from splitstackshape. Using @zx8754's data

    df1 <- data.frame(id = 1:4, year = 1991:1994)
    splitstackshape::cSplit_e(df1, "year", fill = 0)
    
    #  id year year_1 year_2 year_3 year_4
    #1  1 1991      1      0      0      0
    #2  2 1992      0      1      0      0
    #3  3 1993      0      0      1      0
    #4  4 1994      0      0      0      1
    

    To make it work for data other than numeric we need to specify type as "character" explicitly

    df1 <- data.frame(id = 1:4, let = LETTERS[1:4])
    splitstackshape::cSplit_e(df1, "let", fill = 0, type = "character")
    
    #  id let let_A let_B let_C let_D
    #1  1   A     1     0     0     0
    #2  2   B     0     1     0     0
    #3  3   C     0     0     1     0
    #4  4   D     0     0     0     1
    
    0 讨论(0)
提交回复
热议问题