Generate a dummy-variable

前端未结

关注

 17  1140

遇见更好的自我

I have trouble generating the following dummy-variables in R:

I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:

相关标签:

17条回答

無奈伤痛

2020-11-21 12:28

Using dummies::dummy():

library(dummies)

# example data
df1 <- data.frame(id = 1:4, year = 1991:1994)

df1 <- cbind(df1, dummy(df1$year, sep = "_"))

df1
#   id year df1_1991 df1_1992 df1_1993 df1_1994
# 1  1 1991        1        0        0        0
# 2  2 1992        0        1        0        0
# 3  3 1993        0        0        1        0
# 4  4 1994        0        0        0        1

0 讨论(0)

隐瞒了意图╮

2020-11-21 12:28

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){
   g<-c(a,b,c)
  n<-nrow(x)
  newcol<-g[1]
  p<-colnames(x)
  p2<-c(p,newcol)
  new1<-numeric(n)
  state<-x[,g[2]]
  interest<-g[3]
  for(i in 1:n){
    if(state[i]==interest){
      new1[i]=1
    }
    else{
      new1[i]=0
    }
  }
    x$added<-new1
    colnames(x)<-p2
    x
  }

0 讨论(0)

没有蜡笔的小新

2020-11-21 12:29

Package mlr includes createDummyFeatures for this purpose:

library(mlr)
df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
df

#    var
# 1    B
# 2    A
# 3    C
# 4    B
# 5    C
# 6    A
# 7    C
# 8    A
# 9    B
# 10   C

createDummyFeatures(df, cols = "var")

#    var.A var.B var.C
# 1      0     1     0
# 2      1     0     0
# 3      0     0     1
# 4      0     1     0
# 5      0     0     1
# 6      1     0     0
# 7      0     0     1
# 8      1     0     0
# 9      0     1     0
# 10     0     0     1

createDummyFeatures drops original variable.

https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures
.....

0 讨论(0)

南笙

2020-11-21 12:29

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
  var
#1   C
#2   A
#3   C
#4   B
#5   B

library(qdapTools)
mtabulate(df$var)

which gives,

0 讨论(0)

不知归路

2020-11-21 12:29

We can also use cSplit_e from splitstackshape. Using @zx8754's data

df1 <- data.frame(id = 1:4, year = 1991:1994)
splitstackshape::cSplit_e(df1, "year", fill = 0)

#  id year year_1 year_2 year_3 year_4
#1  1 1991      1      0      0      0
#2  2 1992      0      1      0      0
#3  3 1993      0      0      1      0
#4  4 1994      0      0      0      1

To make it work for data other than numeric we need to specify type as "character" explicitly

df1 <- data.frame(id = 1:4, let = LETTERS[1:4])
splitstackshape::cSplit_e(df1, "let", fill = 0, type = "character")

#  id let let_A let_B let_C let_D
#1  1   A     1     0     0     0
#2  2   B     0     1     0     0
#3  3   C     0     0     1     0
#4  4   D     0     0     0     1

0 讨论(0)

上一页 1 2 3