问题
I was recently tasked to write a program in python to find atoms within 2 angstroms distance from every metal in a protein from a .pdb (Protein Data Bank). This is the script I wrote for it.
from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)
def print_coordinates(list):
neighborList = list
for y in neighborList:
print " ", y.get_coord()
structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)
atomList = Selection.unfold_entities(structure, 'A')
ns = NeighborSearch(atomList)
for x in structure.get_atoms():
if x.name == 'ZN' or x.name == 'FE' or x.name == 'CU' or x.name == 'MG' or x.name == 'CA' or x.name == 'MN':
center = x.get_coord()
neighbors = ns.search(center,2.0)
neighborList = Selection.unfold_entities(neighbors, 'A')
print x.get_id(), ': ', neighborList
print_coordinates(neighborList)
else:
continue
But this is only for a single .pdb file, I would like to be able to read an entire directory of them. Since I've only been using Java until now, I am not entirely sure how I would be able to do this in Python 2.7. An idea I have is that I would put the script in a try catch statement and in it, a while loop, then throw an exception when it reaches the end, but that's how I would've done in Java, not sure how I would do it in Python. So I would love to hear any idea or sample code anyone might have.
回答1:
You have some redundancies in your code, for instance this does the same:
from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)
def print_coordinates(neighborList):
for y in neighborList:
print " ", y.get_coord()
structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)
metals = ['ZN', 'FE', 'CU', 'MG', 'CA', 'MN']
atomList = [atom for atom in structure.get_atoms() if atom.name in metals]
ns = NeighborSearch(Selection.unfold_entities(structure, 'A'))
for atom in atomList:
neighbors = ns.search(atom.coord, 2)
print("{0}: {1}").format(atom.name, neighbors)
print_coordinates(neighborList)
To answer your question, you can get a list of all your pdb files using the glob
module and nest your code on a for
loop iterating over all files. Supposing your pdb files are at /home/pdb_files/
:
from Bio.PDB import *
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/home/pdb_files/*')
def print_coordinates(neighborList):
for y in neighborList:
print " ", y.get_coord()
for fileName in pdb_files:
structure_id = fileName.rsplit('/', 1)[1][:-4]
structure = parser.get_structure(structure_id, fileName)
# The rest of your code
来源:https://stackoverflow.com/questions/44669318/reading-an-entire-directory-of-pdb-files-using-biopython