Automatically rename fasta files with the ID of the first sequence in each file

问题

I have multiple fasta files with single sequence in the same directory. I want to rename each fasta file with the header of the single sequence present in the fasta file. When i run my code , i obtain "Substitution pattern not terminated at (user-supplied code)"

my code:

#!/bin/bash

for i in /home/maryem/files/;
do 
  if [ ! -f $i ]; then 
     echo "skipping $i"; 
  else 
     newname=`head -1 $i | sed 's/^\s*\([a-zA-Z0-9]\+\).*$/\1/'`; 
     [ -n "$newname" ] ; 
      mv -i $i $newname.fasta || echo "error at: $i"; 
  fi; 
done | rename s/ // *.fasta

fasta file:

>NC_013361.1 Escherichia coli O26:H11 str. 11368 DNA, complete genome
AGCTTTTCATTCTGACTGCAATGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTATCACCACCATCACCATTACCACAGGT

I'm not sure if there is another way to rename each file with the ID in the header ??

回答1:

Given that the ID is the first "word" of the file, you can run the following in the directory containing the fasta files.

for f in *.fasta; do d="$(head -1 "$f" | awk '{print $1}').fasta"; if [ ! -f "$d" ]; then mv "$f" "$d"; else echo "File '$d' already exists! Skiped '$f'"; fi; done

Credit: https://unix.stackexchange.com/a/13161

来源：https://stackoverflow.com/questions/54078687/automatically-rename-fasta-files-with-the-id-of-the-first-sequence-in-each-file

标签

bash

sed

rename

file-rename

fasta

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!