extract part of a file name in R

后端未结

关注

 1  1531

I\'m trying to write some code to open all the data files in a folder, apply a function (or set of functions) to extract my data of interest. So far, so good. The problem is t

相关标签:

1条回答

感动是毒

2021-01-25 09:29

The pattern here is a date, an optional E\digit or Expt\digit that you don't want, a word that you do want, then an optional SDM that you don't want followed by 'data copy.txt'...

Here's my test data:

> names
[1] "2012-05-31 CTN1 data copy.txt"          
[2] "2012-05-21 E7 PMA1 data copy.txt"       
[3] "2011-11-29 TDH3 SDM data copy.txt"      
[4] "2012-01-04 POX1 data copy.txt"          
[5] "2011-11-29 ECHO data copy.txt"          
[6] "2011-11-29 E8 ECHO data copy.txt"       
[7] "2011-11-29 ECHO SDM data copy.txt"      
[8] "2011-11-29 Expt2 ECHO SDM data copy.txt"

and here's my sub:

> sub(pattern="^....-..-.. (E\\d+ |Expt\\d+ )*(\\w+) (SDM )*data copy.txt","\\2",names)
[1] "CTN1" "PMA1" "TDH3" "POX1" "ECHO" "ECHO" "ECHO" "ECHO"

If your E-prefixes have more than one digit this will also work. I've tried to add some things to my test set starting with E to make sure they get treated properly, as well as the case of an E-prefix and an SDM.

0 讨论(0)