问题
I’m writing a python script (version 2.7) that will change every input file (.nexus format) within the specified directory into .fasta format. The Biopython module SeqIO.convert handles the conversion perfectly for individually specified files but when I try to automate the process over a directory using os.walk I’m unable to correctly pass the pathname of each input file to SeqIO.convert. Where are I going wrong? Do I need to use join() from os.path module and pass the full path names on to SeqIO.convert?
#Import modules
import sys
import re
import os
import fileinput
from Bio import SeqIO
#Specify directory of interest
PSGDirectory = "/Users/InputDirectory”
#Create a class that will run the SeqIO.convert function repeatedly
def process(filename):
count = SeqIO.convert("files", "nexus", "files.fa", "fasta", alphabet= IUPAC.ambiguous_dna)
#Make sure os.walk works correctly
for path, dirs, files in os.walk(PSGDirectory):
print path
print dirs
print files
#Now recursively do the count command on each file inside PSGDirectory
for files in os.walk(PSGDirectory):
print("Converted %i records" % count)
process(files)
When I run the script I get this error message:
Traceback (most recent call last):
File "nexus_to_fasta.psg", line 45, in <module>
print("Converted %i records" % count)
NameError: name 'count' is not defined
This conversation was very helpful but I don’t know where to insert the join() function statements. Here is an example of one of my nexus files
Thanks for your help!
回答1:
There are a few things going on.
First, your process function isn't returning 'count'. You probably want:
def process(filename):
return seqIO.convert("files", "nexus", "files.fa", "fasta", alphabet=IUPAC.ambiguous_dna)
# assuming seqIO.convert actually returns the number you want
Also, when you write for files in os.walk(PSGDirectory)
you're operating on the 3-tuple that os.walk returns, not individual files. You want to do something like this (note the use of os.path.join):
for root, dirs, files in os.walk(PSGDirectory):
for filename in files:
fullpath = os.path.join(root, filename)
print process(fullpath)
Update:
So I looked at the documentation for seqIO.convert and it expects to be called with:
- in_file - an input handle or filename
- in_format - input file format, lower case string
- out_file - an output handle or filename
- out_format - output file format, lower case string
- alphabet - optional alphabet to assume
in_file is the name of the file to convert, and originally you were just calling seqIO.convert with "files".
so your process function should probably be something like this:
def process(filename):
return seqIO.convert(filename, "nexus", filename + '.fa', "fasta", alphabet=IUPAC.ambiguous_dna)
来源:https://stackoverflow.com/questions/21743438/how-do-i-pass-biopython-seqio-convert-over-multiple-files-in-a-directory