问题
I have multiple fasta files with single sequence in the same directory. I want to rename each fasta file with the header of the single sequence present in the fasta file. When i run my code , i obtain "Substitution pattern not terminated at (user-supplied code)"
my code:
#!/bin/bash
for i in /home/maryem/files/;
do
if [ ! -f $i ]; then
echo "skipping $i";
else
newname=`head -1 $i | sed 's/^\s*\([a-zA-Z0-9]\+\).*$/\1/'`;
[ -n "$newname" ] ;
mv -i $i $newname.fasta || echo "error at: $i";
fi;
done | rename s/ // *.fasta
fasta file:
>NC_013361.1 Escherichia coli O26:H11 str. 11368 DNA, complete genome
AGCTTTTCATTCTGACTGCAATGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTATCACCACCATCACCATTACCACAGGT
I'm not sure if there is another way to rename each file with the ID in the header ??
回答1:
Given that the ID is the first "word" of the file, you can run the following in the directory containing the fasta files.
for f in *.fasta; do d="$(head -1 "$f" | awk '{print $1}').fasta"; if [ ! -f "$d" ]; then mv "$f" "$d"; else echo "File '$d' already exists! Skiped '$f'"; fi; done
Credit: https://unix.stackexchange.com/a/13161
来源:https://stackoverflow.com/questions/54078687/automatically-rename-fasta-files-with-the-id-of-the-first-sequence-in-each-file