Parsing GenBank file
问题 Basically, a GenBank file consists on gene entries (announced by 'gene' followed by its corresponding 'CDS' entry (only one per gene) like the two I show here below. I would like to get locus_tag vs product in a tab-delimited two column file. 'gene' and 'CDS' are always preceded and followed by spaces. If this task can be easily performed using an already available tool, please let me know. Input file: gene complement(8972..9094) /locus_tag="HAPS_0004" /db_xref="GeneID:7278619" CDS complement