问题
I have a FASTA formatted file, which is essentially a special text file, containing many entries, one of which looks like below, which I have assigned by the name "FASTA" in R. The original file was red and formated as seen below using seqinr package in R.
FASTA<- structure(list(`tr|A1Z6G9|A1Z6G9_DROME` = structure("MSISASHPCGLNADGTATQYKESTATIQTSGLQSSPRSFLPEREDTLEYFIKFPKPSSKNEFVLAKDHDGEDSHVPIVMLLGWAGCQDRYLMKYSKIYEERGLITVRYTAPVDSLFWKRSEMIPIGEKILKLIQDMNFDAHPLIFHIFSNGGAYLYQHINLAVIKHKSPLQVRGVIFDSAPGERRIISLYRAITAIYGREKRCNCLAALVITITLSIMWFVEESISALKSLFVPSSPVRPSPFCDLKNEANRYPQLFLYSKGDIVIPYRDVEKFIRLRRDQGIQVSSVCFEDAEHVKIYTKYPKQYVQCVCNFIRNCMTIPPLKEAVNSEPSESVSRVNLKYD", name = "tr|A1Z6G9|A1Z6G9_DROME", Annot = ">tr|A1Z6G9|A1Z6G9_DROME CG8245 OS=Drosophila melanogaster GN=CG8245-RA PE=2 SV=1", class = "SeqFastaAA")))
Now although this format allows me to get the name indices of the entry/entries, when I search for it using grep, as seen below
grep("A1Z6G9_DROME", names(FASTA))
or isolate its name using
as.vector(sapply(names(attributes(FASTA)), function(x) attr(FASTA, x)))
However I can not either grep/regexpr any of the text/information in the attributes sections or isolate any of the attributes, such as the text following name= or Annot= section. Can anyone help me with this?
As far as I could gather, when googling read.fasta in R, the manual relating to the seqinr package states something along the lines of annotations/attributes being ignored (I think) but these attribute sections hold important information regarding the identity of the entry, which I desperately need! I have tried the unlist or collapse with the paste function but they remove all the attributes that I need!
回答1:
There are a lot of get*
functions in the seqinr package (see http://www.rdocumentation.org/packages/seqinr). These functions are designed to access different attributes, e.g.:
getAnnot(FASTA)
#[[1]]
#[1] ">tr|A1Z6G9|A1Z6G9_DROME CG8245 OS=Drosophila melanogaster GN=CG8245-RA PE=2 SV=1"
getSequence(FASTA)
#[[1]]
# [1] "M" "S" "I" "S" "A" "S" "H" "P" "C" "G" "L" "N" "A" "D" "G" "T" "A" "T" "Q" "Y" "K" "E" "S" "T" "A" "T" "I" "Q" "T" "S" "G" "L" "Q" "S" "S" "P" "R" "S" "F" "L" "P" "E" "R" "E" "D" "T" "L" "E" "Y" "F" "I" "K" "F" "P" "K" "P" "S" "S" "K"
# [60] "N" "E" "F" "V" "L" "A" "K" "D" "H" "D" "G" "E" "D" "S" "H" "V" "P" "I" "V" "M" "L" "L" "G" "W" "A" "G" "C" "Q" "D" "R" "Y" "L" "M" "K" "Y" "S" "K" "I" "Y" "E" "E" "R" "G" "L" "I" "T" "V" "R" "Y" "T" "A" "P" "V" "D" "S" "L" "F" "W" "K"
#[119] "R" "S" "E" "M" "I" "P" "I" "G" "E" "K" "I" "L" "K" "L" "I" "Q" "D" "M" "N" "F" "D" "A" "H" "P" "L" "I" "F" "H" "I" "F" "S" "N" "G" "G" "A" "Y" "L" "Y" "Q" "H" "I" "N" "L" "A" "V" "I" "K" "H" "K" "S" "P" "L" "Q" "V" "R" "G" "V" "I" "F"
#[178] "D" "S" "A" "P" "G" "E" "R" "R" "I" "I" "S" "L" "Y" "R" "A" "I" "T" "A" "I" "Y" "G" "R" "E" "K" "R" "C" "N" "C" "L" "A" "A" "L" "V" "I" "T" "I" "T" "L" "S" "I" "M" "W" "F" "V" "E" "E" "S" "I" "S" "A" "L" "K" "S" "L" "F" "V" "P" "S" "S"
#[237] "P" "V" "R" "P" "S" "P" "F" "C" "D" "L" "K" "N" "E" "A" "N" "R" "Y" "P" "Q" "L" "F" "L" "Y" "S" "K" "G" "D" "I" "V" "I" "P" "Y" "R" "D" "V" "E" "K" "F" "I" "R" "L" "R" "R" "D" "Q" "G" "I" "Q" "V" "S" "S" "V" "C" "F" "E" "D" "A" "E" "H"
#[296] "V" "K" "I" "Y" "T" "K" "Y" "P" "K" "Q" "Y" "V" "Q" "C" "V" "C" "N" "F" "I" "R" "N" "C" "M" "T" "I" "P" "P" "L" "K" "E" "A" "V" "N" "S" "E" "P" "S" "E" "S" "V" "S" "R" "V" "N" "L" "K" "Y" "D"
来源:https://stackoverflow.com/questions/18518485/how-to-search-and-isolate-attributes-of-fasta-formatted-text-in-r