dummy-variable

Weekday as dummy / factor variable in a linear regression model using statsmodels

白昼怎懂夜的黑 提交于 2019-12-11 15:53:22
问题 The question: How can I add a dummy / factor variable to a model using sm.OLS() ? The details Below is a reproducible dataframe that you can pick up using ctrl + C and then run the snippet further down for a reproducible example. Input data: Date A B weekday 2013-05-04 25.03 88.51 Saturday 2013-05-05 52.98 67.99 Sunday 2013-05-06 39.93 75.19 Monday 2013-05-07 47.31 86.99 Tuesday 2013-05-08 19.61 87.94 Wednesday 2013-05-09 39.51 83.10 Thursday 2013-05-10 21.22 62.16 Friday 2013-05-11 19.04 58

Need Help turning matching factors to a vector to create a dummy variable

喜你入骨 提交于 2019-12-11 15:19:13
问题 I am working on a project analyzing natural disasters effect on interest rates. I am trying to control for countries that use the euro. I want to match a vector that describes the countries that use the euro and create a column that has a dummy variable for each country: 1 if they use the euro and 0 if not. Country EURO ARG 0 FRA 1 GBR 0 CHN 0 I've tried to set the euro category to a Boolean variable, but I have not had any success. I am relatively new to R so I am not confident I am coding

Python Create dummy variables based on day of week in double index

泄露秘密 提交于 2019-12-11 05:45:54
问题 I have a dataframe with a double index (day, time) and would like to create new columns 'Monday', 'Tuesday', 'Wednesday' etc equal to one if the index day is in the correct day. My original dataframe: Visitor Date Time 2017-09-11 4:45 0 5:00 1 5:15 26 .... 2017-09-12 4:45 0 5:00 1 5:15 26 .... What I would like to have: Visitor Monday Tuesday Date Time 2017-09-11 4:45 0 1 0 5:00 1 1 0 5:15 26 1 0 .... 2017-09-12 4:45 0 0 1 5:00 1 0 1 5:15 26 0 1 .... Here is what I tried: df['Monday'] = (df

Mutating dummy variables in dplyr

送分小仙女□ 提交于 2019-12-10 15:55:52
问题 I want to create 7 dummy variables -one for each day, using dplyr So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df #Sample dataframe mydfdata.frame(x=rep(letters[1:9]), day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon")) #1.Create the 7 dummy variables separately daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label") #2. append to dataframe mydf<-bind_cols(mydf

Dummy Encoding using Pyspark [duplicate]

拈花ヽ惹草 提交于 2019-12-10 13:57:04
问题 This question already has answers here : How to handle categorical features with spark-ml? (5 answers) Closed 2 years ago . I am hoping to dummy encode my categorical variables to numerical variables like shown in the image below, using Pyspark syntax. I read in data like this data = sqlContext.read.csv("data.txt", sep = ";", header = "true") In python I am able to encode my variables using the below code data = pd.get_dummies(data, columns = ['Continent']) However I am not sure how to do it

Factor levels default to 1 and 2 in R | Dummy variable

大兔子大兔子 提交于 2019-12-09 06:56:52
问题 I am transitioning from Stata to R. In Stata, if I label a factor levels (say--0 and 1) to (M and F), 0 and 1 would remain as they are. Moreover, this is required for dummy-variable linear regression in most software including Excel and SPSS. However, I've noticed that R defaults factor levels to 1,2 instead of 0,1. I don't know why R does this although regression internally (and correctly) assumes 0 and 1 as the factor variable. I would appreciate any help. Here's what I did: Try #1: sex<-c

Creating a dummy variable for certain hours of the day

丶灬走出姿态 提交于 2019-12-08 13:34:47
问题 i need some help. I'm currently trying to fit a linear model to hourly electricity prices. So, I was thinking of creating a dummy, which takes the value 1, if the hour of the day is between 06:00 and 20:00. Unfortunately, I have struggled so far. time.cet <- as.POSIXct(time.numeric, origin = "1970-01-01", tz=local.time.zone) hours.S <- strftime(time.cet, format = "%H:%M:%S", tz=local.time.zone) head(time.cet) [1] "2007-01-01 00:00:00 CET" "2007-01-01 01:00:00 CET" "2007-01-01 02:00:00 CET" [4

creating a dummy matrix from a concatenated column [duplicate]

Deadly 提交于 2019-12-04 07:12:56
问题 This question already has answers here : Dummify character column and find unique values [duplicate] (7 answers) Closed last year . I'm using R and I have a column that looks like this: relative aunt mother,grandmother sister,mother My desired outcome should look like this: mother sister aunt grandmother 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 How can I do that? Thanks in advance. 回答1: You can do: relative <- c("aunt", "mother,grandmother", "sister,mother", "", "other") R <- strsplit(relative, ',') r

Handling unknown values for label encoding

拟墨画扇 提交于 2019-12-03 11:52:47
问题 How can I handle unknown values for label encoding in sk-learn? The label encoder will only blow up with an exception that new labels were detected. What I want is the encoding of categorical variables via one-hot -encoder. However, sk-learn does not support strings for that. So I used a label encoder on each column. My problem is that in my cross-validation step of the pipeline unknown labels show up. The basic one-hot-encoder would have the option to ignore such cases. An apriori pandas

Factor levels default to 1 and 2 in R | Dummy variable

荒凉一梦 提交于 2019-12-03 09:39:14
I am transitioning from Stata to R. In Stata, if I label a factor levels (say--0 and 1) to (M and F), 0 and 1 would remain as they are. Moreover, this is required for dummy-variable linear regression in most software including Excel and SPSS. However, I've noticed that R defaults factor levels to 1,2 instead of 0,1. I don't know why R does this although regression internally (and correctly) assumes 0 and 1 as the factor variable. I would appreciate any help. Here's what I did: Try #1: sex<-c(0,1,0,1,1) sex<-factor(sex,levels = c(1,0),labels = c("F","M")) str(sex) Factor w/ 2 levels "F","M": 2