I have trouble generating the following dummy-variables in R:
I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:
Using dummies::dummy():
library(dummies)
# example data
df1 <- data.frame(id = 1:4, year = 1991:1994)
df1 <- cbind(df1, dummy(df1$year, sep = "_"))
df1
# id year df1_1991 df1_1992 df1_1993 df1_1994
# 1 1 1991 1 0 0 0
# 2 2 1992 0 1 0 0
# 3 3 1993 0 0 1 0
# 4 4 1994 0 0 0 1
Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.
If x is the data frame is x and i want a dummy variable called a
which will take value 1
when x$b
takes value c
introducedummy<-function(x,a,b,c){
g<-c(a,b,c)
n<-nrow(x)
newcol<-g[1]
p<-colnames(x)
p2<-c(p,newcol)
new1<-numeric(n)
state<-x[,g[2]]
interest<-g[3]
for(i in 1:n){
if(state[i]==interest){
new1[i]=1
}
else{
new1[i]=0
}
}
x$added<-new1
colnames(x)<-p2
x
}
Package mlr
includes createDummyFeatures
for this purpose:
library(mlr)
df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
df
# var
# 1 B
# 2 A
# 3 C
# 4 B
# 5 C
# 6 A
# 7 C
# 8 A
# 9 B
# 10 C
createDummyFeatures(df, cols = "var")
# var.A var.B var.C
# 1 0 1 0
# 2 1 0 0
# 3 0 0 1
# 4 0 1 0
# 5 0 0 1
# 6 1 0 0
# 7 0 0 1
# 8 1 0 0
# 9 0 1 0
# 10 0 0 1
createDummyFeatures
drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures
.....
Another way is to use mtabulate
from qdapTools
package, i.e.
df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
var
#1 C
#2 A
#3 C
#4 B
#5 B
library(qdapTools)
mtabulate(df$var)
which gives,
A B C 1 0 0 1 2 1 0 0 3 0 0 1 4 0 1 0 5 0 1 0
We can also use cSplit_e
from splitstackshape
. Using @zx8754's data
df1 <- data.frame(id = 1:4, year = 1991:1994)
splitstackshape::cSplit_e(df1, "year", fill = 0)
# id year year_1 year_2 year_3 year_4
#1 1 1991 1 0 0 0
#2 2 1992 0 1 0 0
#3 3 1993 0 0 1 0
#4 4 1994 0 0 0 1
To make it work for data other than numeric we need to specify type
as "character"
explicitly
df1 <- data.frame(id = 1:4, let = LETTERS[1:4])
splitstackshape::cSplit_e(df1, "let", fill = 0, type = "character")
# id let let_A let_B let_C let_D
#1 1 A 1 0 0 0
#2 2 B 0 1 0 0
#3 3 C 0 0 1 0
#4 4 D 0 0 0 1