问题
Ok so I have a csv file similar to this structure
hashID,value,flag
98fafd, 35, 1
fh56w2, 25, 0
ggjeas, 55, 1
adfh5d, 45, 0
Basically what I want to do is get the median of the value column but only include rows where flag==1
in the calculation.
Is this even possible in R? I've searched around and haven't found anything like this.
回答1:
Here is one possibility:
Read your data set using the following command:
newdata <- read.csv("stackoverflow questions/mediancol.csv")
# I assume you have the data in csv format
# Showing the data I used for the computation
newdata <- structure(list(hashID = structure(c(1L, 3L, 4L, 2L), .Label = c("98fafd",
"adfh5d", "fh56w2", "ggjeas"), class = "factor"), value = c(35L,
25L, 55L, 45L), flag = c(1L, 0L, 1L, 0L)), .Names = c("hashID",
"value", "flag"), class = "data.frame", row.names = c(NA, -4L
))
> newdata
hashID value flag
1 98fafd 35 1
2 fh56w2 25 0
3 ggjeas 55 1
4 adfh5d 45 0
# Subset the data when flag =1
newdata1 <- subset(newdata,flag==1)
# Look at the summary of the data
> summary(newdata1)
hashID value flag
98fafd:1 Min. :35 Min. :1
adfh5d:0 1st Qu.:40 1st Qu.:1
fh56w2:0 Median :45 Median :1
ggjeas:1 Mean :45 Mean :1
3rd Qu.:50 3rd Qu.:1
Max. :55 Max. :1
# Only look at the median
median(newdata1$value)
[1] 45
回答2:
You can also do this in a quick one-liner with a boolean array for an index to the data frame:
# read the data from a csv file
newdata <- read.csv("file.csv")
# this will give you a vector of boolean values of length nrow(newdata)
newdata$flag==1
# and this line uses the above vector to retrieve only those elements of
# newdata$value for which the row contains a flag value of 1
median(newdata$value[newdata$flag==1])
来源:https://stackoverflow.com/questions/17435810/getting-median-of-a-column-where-value-of-another-column-is-1-in-r