问题
I have xxx.pdb files as:
ATOM 1910 CB SER 128 45.806 50.621 39.840 1.00 9.36
ATOM 1913 OG SER 128 44.538 51.195 39.571 1.00 9.36
ATOM 1915 C SER 128 45.325 48.172 40.360 1.00 9.36
ATOM 1916 O SER 128 45.368 47.955 39.155 1.00 9.36
ATOM 1917 N SER 129 44.953 47.236 41.238 1.00 11.24
ATOM 1919 CA SER 129 44.395 45.938 40.826 1.00 11.24
ATOM 1921 CB SER 129 44.091 45.053 42.031 1.00 11.24
ATOM 1924 OG SER 129 43.483 45.786 43.085 1.00 11.24
When I tried this code: awk '{if($10<11){$9="1.50"};print $0}' xxx.pdb
This happened:
ATOM 1910 CB SER 128 45.806 50.621 39.840 1.50 9.36
ATOM 1913 OG SER 128 44.538 51.195 39.571 1.50 9.36
ATOM 1915 C SER 128 45.325 48.172 40.360 1.50 9.36
ATOM 1916 O SER 128 45.368 47.955 39.155 1.50 9.36
ATOM 1917 N SER 129 44.953 47.236 41.238 1.00 11.24
ATOM 1919 CA SER 129 44.395 45.938 40.826 1.00 11.24
ATOM 1921 CB SER 129 44.091 45.053 42.031 1.00 11.24
ATOM 1924 OG SER 129 43.483 45.786 43.085 1.00 11.24
Any idea on how to preserve the column formatting?
Thanks.
回答1:
awk 'BEGIN{FS=OFS="\t";}{if($10<11){$9="1.50"};print $0}' xxx.pdb
use tab as input and output delimiter.
回答2:
With GNU awk for gensub():
$ awk '$NF<11{$0=gensub(/\S+(\s+\S+)$/,"1.50\\1",1)}1' file
ATOM 1910 CB SER 128 45.806 50.621 39.840 1.50 9.36
ATOM 1913 OG SER 128 44.538 51.195 39.571 1.50 9.36
ATOM 1915 C SER 128 45.325 48.172 40.360 1.50 9.36
ATOM 1916 O SER 128 45.368 47.955 39.155 1.50 9.36
ATOM 1917 N SER 129 44.953 47.236 41.238 1.00 11.24
ATOM 1919 CA SER 129 44.395 45.938 40.826 1.00 11.24
ATOM 1921 CB SER 129 44.091 45.053 42.031 1.00 11.24
ATOM 1924 OG SER 129 43.483 45.786 43.085 1.00 11.24
The above will work no matter what the white space is between fields (tabs, blanks, whatever...).
回答3:
If perl
is okay
$ perl -ape 's/\S+(?=\s+\S+$)/1.50/ if $F[-1]<11' xxx.pdb
ATOM 1910 CB SER 128 45.806 50.621 39.840 1.50 9.36
ATOM 1913 OG SER 128 44.538 51.195 39.571 1.50 9.36
ATOM 1915 C SER 128 45.325 48.172 40.360 1.50 9.36
ATOM 1916 O SER 128 45.368 47.955 39.155 1.50 9.36
ATOM 1917 N SER 129 44.953 47.236 41.238 1.00 11.24
ATOM 1919 CA SER 129 44.395 45.938 40.826 1.00 11.24
ATOM 1921 CB SER 129 44.091 45.053 42.031 1.00 11.24
ATOM 1924 OG SER 129 43.483 45.786 43.085 1.00 11.24
\S+(?=\s+\S+$)
uses positive lookahead to match last but one field- use
\S+(?=\s+\S+\s*$)
if there can be white-spaces at end of line
- use
$F[-1]<11
condition check if last field is less than11
- See http://perldoc.perl.org/perlrun.html#Command-Switches for details on
-ape
options. The-a
option will auto-split input line on space and save to@F
array
回答4:
I'm not sure what you're trying to accomplish, but in general, to read in a xxx.pdb file, and then output a new.pdb file with proper format, this is what I do:
awk '{printf "%4s%7.0f%3s%6s%2s%4.0f%12.3f%8.3f%8.3f%6.2f%7.2f\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11}' < xxx.pdb >> new.pdb
(This code won't actually do anything except make a copy)
If I wanted to use a variable to change one of the fields (like the second field), it would look like:
VARIABLE=3
awk -v x=$VARIABLE '{printf "%4s%7.0f%3s%6s%2s%4.0f%12.3f%8.3f%8.3f%6.2f%7.2f\n", $1, ($2 + x), $3, $4, $5, $6, $7, $8, $9, $10, $11}' < xxx.pdb >> new.pdb
This would add 3 to all the entire second column of the pdb file.
来源:https://stackoverflow.com/questions/45145039/keeping-pdb-file-format-after-editing