I have a data frame with about 40 columns, the second column, data[2] contains the name of the company that the rest of the row data describes. However, the names of the compani
First of all (as Jonathan done in his comment) to reference second column you should use either data[[2]]
or data[,2]
. But if you are using subset you could use column name: subset(data, CompanyName == ...)
.
And for you question I will do one of:
subset(data, data[[2]] %in% c("Company Name 09", "Company Name"), drop = TRUE)
subset(data, grepl("^Company Name", data[[2]]), drop = TRUE)
In second I use grepl
(introduced with R version 2.9) which return logical vector with TRUE
for match.
A couple of things:
1) Mock-up data is useful as we don't know exactly what you're faced with. Please supply data if possible. Maybe I misunderstood in what follows?
2) Don't use [[2]]
to index your data.frame, I think [,"colname"] is much clearer
3) If the only difference is a trailing ' 09' in the name, then simply regexp that out:
R> x1 <- c("foo 09", "bar", "bar 09", "foo")
R> x2 <- gsub(" 09$", "", x1)
[1] "foo" "bar" "bar" "foo"
R>
Now you should be able to do your subset on the on-the-fly transformed data:
R> data <- data.frame(value=1:4, name=x1)
R> subset(data, gsub(" 09$", "", name)=="foo")
value name
1 1 foo 09
4 4 foo
R>
You could also have replace the name column with regexp'ed value.