问题
What I'm trying to do is create a two histograms in R, based on if an employee at SeaWorld negotiated a salary increase and one for if they did not negotiate a salary increase. Could someone please show me where I went wrong. Any help is appreciated.
Here's an example of the textfile I'm using.
emp received negotiated gender year
#325 12.5 TRUE F 2013
#318 5.2 FALSE F 2013
#217 9.8 FALSE M 2013
#223 6.8 TRUE M 2013
#218 2.1 TRUE F 2006
#601 13.9 FALSE M 2006
#225 7.8 TRUE M 2006
#281 8.5 FALSE F 2006
Here's the code I have so far:
d<-read.csv("employees.txt", header=TRUE, sep="\t")
str(d)
f1 <- mean(d$received)
f2 <- median(d$received)
f3 <- sd(d$recieved)
d$gender <- factor(d$gender, labels=c(1, 2))
pairs(d)
plot(d$received ~ d$gender)
plot(d$received ~ d$year, xlab="year", ylab="recieved")
m <- lm(d$received~d$year)
print(m)
print(f1)
print(f2)
print(f3)
abline(m)
abline(mean(d$received), 0, lty=2)
hist(d$received[d$gender ==1],breaks = 50)
dev.new()
hist(d$received[d$gender ==2],breaks = 50)
dev.new()
#hist(d$year, breaks = 50)
#dev.new()
plot(d$gender, d$received)
回答1:
The #
symbols in your data are causing problems for me...
With the #
symbol...
d1 <- read.table(text = "
emp received negotiated gender year
#325 12.5 TRUE F 2013
#318 5.2 FALSE F 2013
#217 9.8 FALSE M 2013
#223 6.8 TRUE M 2013
#218 2.1 TRUE F 2006
#601 13.9 FALSE M 2006
#225 7.8 TRUE M 2006
#281 8.5 FALSE F 2006",
header = TRUE)
We get an empty data frame...
str(d1)
'data.frame': 0 obs. of 5 variables:
$ emp : logi
$ received : logi
$ negotiated: logi
$ gender : logi
$ year : logi
But without the #
we get...
d2 <- read.table(text = "
emp received negotiated gender year
325 12.5 TRUE F 2013
318 5.2 FALSE F 2013
217 9.8 FALSE M 2013
223 6.8 TRUE M 2013
218 2.1 TRUE F 2006
601 13.9 FALSE M 2006
225 7.8 TRUE M 2006
281 8.5 FALSE F 2006",
header = TRUE)
...the data as expected:
str(d2)
'data.frame': 8 obs. of 5 variables:
$ emp : int 325 318 217 223 218 601 225 281
$ received : num 12.5 5.2 9.8 6.8 2.1 13.9 7.8 8.5
$ negotiated: logi TRUE FALSE FALSE TRUE TRUE FALSE ...
$ gender : Factor w/ 2 levels "F","M": 1 1 2 2 1 2 2 1
$ year : int 2013 2013 2013 2013 2006 2006 2006 2006
And for your question about how to create a histogram on how much of a raise the employee received based on if the asked for the raise or not:
hist(d$received[d$negotiated == TRUE])
hist(d$received[d$negotiated == FALSE])
来源:https://stackoverflow.com/questions/19886572/how-to-create-a-histogram-based-on-true-or-false-in-r