I tried to split the data(bank) into training data and test data. But I somehow got an error below.How can I solve this problem?
train = bank[1:100, ]
test =
The issue is in subset=train
. According to the ?glm
. the subset
should be a vector as oppose to a subset of original dataset:
subset an optional vector specifying a subset of observations to be used in the fitting process.
Hence, you may need to change the code to:
glm.fit=glm(Status~Length+Right+Bottom+Top+Diagonal,data=train,family=binomial)
or
glm.fit=glm(Status~Length+Right+Bottom+Top+Diagonal,data=bank,family=binomial,subset=1:100)
If you set the training data like:
data[1: 100,]
Then in lm()
function you use the argument:
data = bank[train,]
Alternatively you can set train like:
seq(1: 100)
as a sequence of indices, you need to use in the
lm(): data = bank, subset = train
Generally, you could achieve what you asked by doing something like this: Assume column 'response' is observed column:
samples=1:100
train = bank[samples, ]
test = bank[-samples,]
Status.test =bank[samples,'response']
BTW, I would suggest using sample()
function in order to take samples randomly for train and test. like this:
samples=sample(nrow(bank), 0.8*nrow(bank))
train = bank[samples, ]
test = bank[-samples,]
Status.test =bank[samples,'response']