keeping PDB file format after editing.

99封情书 提交于 2019-12-11 17:00:48

问题


I have xxx.pdb files as:

 ATOM   1910  CB  SER   128      45.806  50.621  39.840  1.00  9.36
 ATOM   1913  OG  SER   128      44.538  51.195  39.571  1.00  9.36
 ATOM   1915  C   SER   128      45.325  48.172  40.360  1.00  9.36
 ATOM   1916  O   SER   128      45.368  47.955  39.155  1.00  9.36
 ATOM   1917  N   SER   129      44.953  47.236  41.238  1.00 11.24
 ATOM   1919  CA  SER   129      44.395  45.938  40.826  1.00 11.24
 ATOM   1921  CB  SER   129      44.091  45.053  42.031  1.00 11.24
 ATOM   1924  OG  SER   129      43.483  45.786  43.085  1.00 11.24

When I tried this code: awk '{if($10<11){$9="1.50"};print $0}' xxx.pdb

This happened:

ATOM 1910 CB SER 128 45.806 50.621 39.840 1.50 9.36
ATOM 1913 OG SER 128 44.538 51.195 39.571 1.50 9.36
ATOM 1915 C SER 128 45.325 48.172 40.360 1.50 9.36
ATOM 1916 O SER 128 45.368 47.955 39.155 1.50 9.36
ATOM   1917  N   SER   129      44.953  47.236  41.238  1.00 11.24
ATOM   1919  CA  SER   129      44.395  45.938  40.826  1.00 11.24
ATOM   1921  CB  SER   129      44.091  45.053  42.031  1.00 11.24
ATOM   1924  OG  SER   129      43.483  45.786  43.085  1.00 11.24

Any idea on how to preserve the column formatting?

Thanks.


回答1:


awk 'BEGIN{FS=OFS="\t";}{if($10<11){$9="1.50"};print $0}' xxx.pdb

use tab as input and output delimiter.




回答2:


With GNU awk for gensub():

$ awk '$NF<11{$0=gensub(/\S+(\s+\S+)$/,"1.50\\1",1)}1' file
 ATOM   1910  CB  SER   128      45.806  50.621  39.840  1.50  9.36
 ATOM   1913  OG  SER   128      44.538  51.195  39.571  1.50  9.36
 ATOM   1915  C   SER   128      45.325  48.172  40.360  1.50  9.36
 ATOM   1916  O   SER   128      45.368  47.955  39.155  1.50  9.36
 ATOM   1917  N   SER   129      44.953  47.236  41.238  1.00 11.24
 ATOM   1919  CA  SER   129      44.395  45.938  40.826  1.00 11.24
 ATOM   1921  CB  SER   129      44.091  45.053  42.031  1.00 11.24
 ATOM   1924  OG  SER   129      43.483  45.786  43.085  1.00 11.24

The above will work no matter what the white space is between fields (tabs, blanks, whatever...).




回答3:


If perl is okay

$ perl -ape 's/\S+(?=\s+\S+$)/1.50/ if $F[-1]<11' xxx.pdb 
 ATOM   1910  CB  SER   128      45.806  50.621  39.840  1.50  9.36
 ATOM   1913  OG  SER   128      44.538  51.195  39.571  1.50  9.36
 ATOM   1915  C   SER   128      45.325  48.172  40.360  1.50  9.36
 ATOM   1916  O   SER   128      45.368  47.955  39.155  1.50  9.36
 ATOM   1917  N   SER   129      44.953  47.236  41.238  1.00 11.24
 ATOM   1919  CA  SER   129      44.395  45.938  40.826  1.00 11.24
 ATOM   1921  CB  SER   129      44.091  45.053  42.031  1.00 11.24
 ATOM   1924  OG  SER   129      43.483  45.786  43.085  1.00 11.24
  • \S+(?=\s+\S+$) uses positive lookahead to match last but one field
    • use \S+(?=\s+\S+\s*$) if there can be white-spaces at end of line
  • $F[-1]<11 condition check if last field is less than 11
  • See http://perldoc.perl.org/perlrun.html#Command-Switches for details on -ape options. The -a option will auto-split input line on space and save to @F array



回答4:


I'm not sure what you're trying to accomplish, but in general, to read in a xxx.pdb file, and then output a new.pdb file with proper format, this is what I do:

    awk '{printf "%4s%7.0f%3s%6s%2s%4.0f%12.3f%8.3f%8.3f%6.2f%7.2f\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11}' < xxx.pdb >> new.pdb

(This code won't actually do anything except make a copy)

If I wanted to use a variable to change one of the fields (like the second field), it would look like:

    VARIABLE=3

    awk -v x=$VARIABLE '{printf "%4s%7.0f%3s%6s%2s%4.0f%12.3f%8.3f%8.3f%6.2f%7.2f\n", $1, ($2 + x), $3, $4, $5, $6, $7, $8, $9, $10, $11}' < xxx.pdb >> new.pdb

This would add 3 to all the entire second column of the pdb file.



来源:https://stackoverflow.com/questions/45145039/keeping-pdb-file-format-after-editing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!