I\'m trying to write some code to open all the data files in a folder, apply a function (or set of functions) to extract my data of interest. So far, so good. The problem is t
The pattern here is a date, an optional E\digit or Expt\digit that you don't want, a word that you do want, then an optional SDM that you don't want followed by 'data copy.txt'...
Here's my test data:
> names
[1] "2012-05-31 CTN1 data copy.txt"
[2] "2012-05-21 E7 PMA1 data copy.txt"
[3] "2011-11-29 TDH3 SDM data copy.txt"
[4] "2012-01-04 POX1 data copy.txt"
[5] "2011-11-29 ECHO data copy.txt"
[6] "2011-11-29 E8 ECHO data copy.txt"
[7] "2011-11-29 ECHO SDM data copy.txt"
[8] "2011-11-29 Expt2 ECHO SDM data copy.txt"
and here's my sub
:
> sub(pattern="^....-..-.. (E\\d+ |Expt\\d+ )*(\\w+) (SDM )*data copy.txt","\\2",names)
[1] "CTN1" "PMA1" "TDH3" "POX1" "ECHO" "ECHO" "ECHO" "ECHO"
If your E-prefixes have more than one digit this will also work. I've tried to add some things to my test set starting with E
to make sure they get treated properly, as well as the case of an E-prefix and an SDM.