问题
I am willing to perform a logistic regression for my dataset. I use:
glm.fit=glm(direccion~Profit, data=datos, family=binomial)
Minute ecopet TASA10 direccion Minute cl1 Day Profit
1 571 2160 5 1 571 51.85 2015-02-20 -0.03
2 572 2160 5 1 572 51.92 2015-02-20 0.04
3 573 2160 5 1 573 51.84 2015-02-20 -0.04
4 574 2160 5 1 574 51.77 2015-02-20 -0.11
5 575 2160 10 1 575 51.69 2015-02-20 -0.19
6 576 2165 5 1 576 51.69 2015-02-20 -0.16
7 577 2165 -5 0 577 51.64 2015-02-20 -0.28
8 578 2165 -10 0 578 51.47 2015-02-20 -0.37
9 579 2165 -10 0 579 51.41 2015-02-20 -0.36
10 580 2170 -15 0 580 51.44 2015-02-20 -0.25
11 581 2170 -30 0 581 51.48 2015-02-20 -0.21
12 582 2160 -20 0 582 51.52 2015-02-20 -0.12
13 583 2155 -5 0 583 51.56 2015-02-20 0.09
14 584 2155 -5 0 584 51.51 2015-02-20 0.10
15 585 2155 -5 0 585 51.44 2015-02-20 0.00
16 586 2140 10 1 586 51.30 2015-02-20 -0.18
17 587 2140 10 1 587 51.31 2015-02-20 -0.21
18 588 2150 0 0 588 51.31 2015-02-20 -0.25
As you can see, the variable 'direccion' is a binary variable and is the dependent variable in my logistic regression. It is 1 whenever the variable 'TASA10' is positive and 0 otherwise. The problem is that after I run the code, I get:
'Error in weights * y : non-numeric argument to binary operator'
would you know why is that?
Thanks!!
回答1:
It appears that the direccion
column is a character column rather than a numeric one. You can verify by running str(datos)
; you'll see something like
'data.frame': 18 obs. of 8 variables:
$ Minute : int 571 572 573 574 575 576 577 578 579 580 ...
$ ecopet : int 2160 2160 2160 2160 2160 2165 2165 2165 2165 2170 ...
$ TASA10 : int 5 5 5 5 10 5 -5 -10 -10 -15 ...
$ direccion: chr "1" "1" "1" "1" ...
$ Minute.1 : int 571 572 573 574 575 576 577 578 579 580 ...
$ cl1 : num 51.9 51.9 51.8 51.8 51.7 ...
$ Day : Factor w/ 1 level "2015-02-20": 1 1 1 1 1 1 1 1 1 1 ...
$ Profit : num -0.03 0.04 -0.04 -0.11 -0.19 -0.16 -0.28 -0.37 -0.36 -0.25 ...
In particular note the type of the direccion
column. This can be fixed by running
datos$direccion <- as.numeric(datos$direccion)
If it is a factor then you need to make sure that you don't lose the coding by using
datos$direccion <- as.numeric(as.character(datos$direccion))
Even better is to look back in your pipeline to the code that generates this data frame and fixing that to encode as numeric rather than as a string.
回答2:
glm()
only accepts variables that are either of numeric
or factor
type, it does not know how to deal with character
type variables.
You could make a simple factorise function that turns all character (chr
) columns into factors, while leaving numeric columns as they are:
factorize = function(column, df){
#' Check if column is character and turn to factor
if (class(df[1,column]) == "character"){
out = as.factor(df[,column])
} else { # if it's numeric
out = df[,column]
}
return(out)
}
store.colnames = colnames(data)
data = lapply(store.colnames, function(column) factorize(column, data))
data = as.data.frame(data)
colnames(data) = store.colnames
The code could be much prettier but it will get the job done and I just wanted to illustrate the point.
Alternatively, you could just change a single column to factor type:
datos$direccion = as.factor(datos$direccion)
Hope that helps!
来源:https://stackoverflow.com/questions/32953750/why-am-i-getting-error-in-weights-y-non-numeric-argument-to-binary-operator