Remove heteroatoms from PDB

问题

The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

Any suggestion?

回答1:

The heteroatoms shouldn't be part of the chain. But you can know if a residue is a heteroatom with:

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")

for residue in pdb.get_residues():
    tags = residue.get_full_id()

    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)

    if tags[3][0] != " ":
        # The residue is a heteroatom
    else:
        # It is not

You can also get the id of the residue (without the three first fields) with:

tags = residue.id

# or het_flag,_ ,_ = residue.id

if tags[0] != " ":
    # The residue is a heteroatom
else:
    # It is not

I'm adding a link to the relevant documentation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf

The subject is in the page 8, "What is a residue id?". Quoting:

This is a bit more complicated, due to the clumsy PDB format. A residue id is a tuple with three elements:

The hetero-flag: this is ’H_’ plus the name of the hetero-residue (eg. ’H_GLC’ in the case of a glucose molecule), or ’W’ in the case of a water molecule.

To add comments in and resume:

from Bio.PDB import PDBParser, PDBIO, Select

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if residue.id[0] == " " else 0

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())

回答2:

I once used the code "Removing residues" from http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython

It will miss some heteroatoms. I guess it may because that the chain changes every time the detach_child is called.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

After modified as below (just avoid dynamically modifying the iterable), it worked fine for me. (I only used structure[0] here.)

model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
    for residue in chain:
        if residue.id[0] != ' ':
            residue_to_remove.append((chain.id, residue.id))
    if len(chain) == 0:
        chain_to_remove.append(chain.id)

for residue in residue_to_remove:
    model[residue[0]].detach_child(residue[1])

for chain in chain_to_remove:
    model.detach_child(chain)

来源：https://stackoverflow.com/questions/25718201/remove-heteroatoms-from-pdb

标签

python

biopython

protein-database