Selecting specific rows based on values in 2 columns in R

问题

I have a large data set of GPS collar locations that have a varying number of locations each day. I want to separate out only the days that have a single location collected and make a new data frame containing all their information.

month    day    easting    northing    time    ID
  6       1     #######    ########    0:00    ##
  6       2     #######    ########    6:00    ##
  6       2     #######    ########    0:00    ##
  6       3     #######    ########    18:00   ##
  6       3     #######    ########    12:00   ##
  6       4     #######    ########    0:00    ##
  6       5     #######    ########    6:00    ##

Currently I have hashed together something, but can't quite get to the next step.

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

This gives me a readout that looks like

    MONTH day freq
1       6  29    1
7       7   5    1
8       7   6    1
10      7   8    1
12      7  10    1

I am trying to use the values of the Month and day to pull out the rows that contain them from the main dataset so that I can make a data frame containing only the points with a frequency of 1 but that contains all the associated data. I've got to this point:

sis<-c(datasub1$MONTH)
bro<-c(datasub1$day)
datasub2<-subset(data1,MONTH==sis&day==bro)

... but that doesn't give me anything, personally it makes intuitive sense (R beginner) that it should subset out the rows that contain both the values of bro and sis.

Any help would be greatly appreciated.

回答1:

Revised:

datasub2<-subset(data1, paste(month,day,sep=".") %in% paste(datasub1$MONTH, datasub1$day,sep=".") )

It's not very likely (and quite possibly impossible) that any particular MONTH item will exactly equal that subset. You are presumably more interested in whether a combo of "Month.Day" is in the combo sets of "Month.Day" in the datasub1. You have mixed up the capitalization that returns from the count() function if the headers were as you illustrated.

> dog
  month day freq
1     6   1    1
2     6   2    2
3     6   3    2
4     6   4    1
5     6   5    1
> datasub1
  month day freq
1     6   1    1
4     6   4    1
5     6   5    1
> datasub2
  month day easting northing time ID
1     6   1 ####### ######## 0:00 ##
6     6   4 ####### ######## 0:00 ##
7     6   5 ####### ######## 6:00 ##

回答2:

After this:

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))

try this:

indx = which(dog$freq==1)
data1[indx,]

回答3:

data1[rownames(datasub1), ]

This is an extension of the OP's original thinking but may not be what they're after and is really just what Wesley suggested but carrying the OP's original steps one more forward (minus the bro sis part which confused me a bit because...well for the same reason DWin said :)). You're after the rownames not really the values in those columns. You've already got that information. The row names carry that information back to the original data set.

n <- 100
data1 <- data.frame(
    Accuracy = round(runif(n, 0, 5), 1),
    MONTH    = sample(1:5, n, replace=TRUE),
    day      = sample(1:28, n, replace=TRUE),
    Easting  = rnorm(n),
    Northing = rnorm(n),
    Etc      = rnorm(n)
)


library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

data1[rownames(datasub1), ]

来源：https://stackoverflow.com/questions/8932781/selecting-specific-rows-based-on-values-in-2-columns-in-r

标签

database

count

plyr