Optimising Python dictionary access code

后端 未结 5 1747
迷失自我
迷失自我 2021-01-30 05:11

Question:

I\'ve profiled my Python program to death, and there is one function that is slowing everything down. It uses Python dictionaries heavily, so

5条回答
  •  无人及你
    2021-01-30 05:44

    I would have posted this as an update to my question, but Stack Overflow only allows 30000 characters in questions, so I'm posting this as an answer.

    Update: My best optimisations so far

    I've taken on board people's suggestions, and now my code runs about 21% faster than before, which is good - thanks everyone!

    This is the best I've managed to do so far. I've replaced all the == tests with is for nodes, disabled garbage collection and re-written the big if statement part at Line 388, in line with @Nas Banov's suggestions. I added in the well-known try/except trick for avoiding tests (line 390 - to remove the test node_c in node_b_distances), which helped loads, since it hardly ever throws the exception. I tried switching lines 391 and 392 around, and assigning node_b_distances[node_c] to a variable, but this way was the quickest.

    However, I still haven't tracked down the memory leak yet (see graph in my question). But I think this might be in a different part of my code (that I haven't posted here). If I can fix the memory leak, then this program will run quickly enough for me to use :)

    Timer unit: 3.33366e-10 s
    File: routing_distances.py
    Function: propagate_distances_node at line 328
    Total time: 760.74 s
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    328                                               @profile
    329                                               def propagate_distances_node(self, node_a, cutoff_distance=200):
    330                                                       
    331                                                   # a makes sure its immediate neighbours are correctly in its distance table
    332                                                   # because its immediate neighbours may change as binds/folding change
    333    791349   4158169713   5254.5      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
    334    550522   2331886050   4235.8      0.1              use_neighbour_link = False
    335                                                       
    336    550522   2935995237   5333.1      0.1              if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b
    337     15931     68829156   4320.5      0.0                  use_neighbour_link = True
    338                                                       else: # a does know distance to b
    339    534591   2728134153   5103.2      0.1                  (node_distance_b_a, next_node) = self.node_distances[node_a][node_b]
    340    534591   2376374859   4445.2      0.1                  if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter
    341        78       347355   4453.3      0.0                      use_neighbour_link = True
    342    534513   3145889079   5885.5      0.1                  elif((None is next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken
    343        74       327600   4427.0      0.0                      use_neighbour_link = True
    344                                                               
    345    550522   2414669022   4386.1      0.1              if(use_neighbour_link):
    346     16083     81850626   5089.3      0.0                  self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None)
    347     16083     87064200   5413.4      0.0                  self.nodes_changed.add(node_a)
    348                                                           
    349                                                           ## Affinity distances update
    350     16083     86580603   5383.4      0.0                  if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)):
    351       234      6656868  28448.2      0.0                      self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data))     
    352                                                   
    353                                                   # a sends its table to all its immediate neighbours
    354    791349   4034651958   5098.4      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
    355    550522   2392248546   4345.4      0.1              node_b_changed = False
    356                                               
    357                                                       # b integrates a's distance table with its own
    358    550522   2520330696   4578.1      0.1              node_b_chemical = node_b.chemical
    359    550522   2734341975   4966.8      0.1              node_b_distances = node_b_chemical.node_distances[node_b]
    360                                                       
    361                                                       # For all b's routes (to c) that go to a first, update their distances
    362  46679347 222161837193   4759.3      9.7              for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok)
    363  46128825 211963639122   4595.0      9.3                  if(node_after_b is node_a):
    364                                                               
    365  18677439  79225517916   4241.8      3.5                      try:
    366  18677439 101527287264   5435.8      4.4                          distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0]
    367    181510    985441680   5429.1      0.0                      except KeyError:
    368    181510   1166118921   6424.5      0.1                          distance_b_a_c = float('+inf')
    369                                                                   
    370  18677439  89626381965   4798.6      3.9                      if(distance_b_c != distance_b_a_c): # a's distance to c has changed
    371    692131   3352970709   4844.4      0.1                          node_b_distances[node_c] = (distance_b_a_c, node_a)
    372    692131   3066946866   4431.2      0.1                          node_b_changed = True
    373                                                                   
    374                                                                   ## Affinity distances update
    375    692131   3808548270   5502.6      0.2                          if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
    376     96794   1655818011  17106.6      0.1                              node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
    377                                                                   
    378                                                               # If distance got longer, then ask b's neighbours to update
    379                                                               ## TODO: document this!
    380  18677439  88838493705   4756.5      3.9                      if(distance_b_a_c > distance_b_c):
    381                                                                   #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems():
    382   1656796   7949850642   4798.3      0.3                          for node in node_b_chemical.neighbours[node_b]:
    383   1172486   6307264854   5379.4      0.3                              node.chemical.nodes_changed.add(node)
    384                                                       
    385                                                       # Look for routes from a to c that are quicker than ones b knows already
    386  46999631 227198060532   4834.0     10.0              for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems():
    387                                                           
    388  46449109 218024862372   4693.8      9.6                  if((node_after_a is not node_b) and # not a-b-a-b path
    389  28049321 126269403795   4501.7      5.5                     (node_c is not node_b)):         # not a-b path
    390  27768341 121588366824   4378.7      5.3                      try: # Assume node_c in node_b_distances ('try' block will raise KeyError if not)
    391  27768341 159413637753   5740.8      7.0                          if((node_b_distances[node_c][1] is not node_a) and # b doesn't already go to a first
    392   8462467  51890478453   6131.8      2.3                             ((neighbour_distance_b_a + distance_a_c) < node_b_distances[node_c][0])):
    393                                                               
    394                                                                       # Found a route
    395    224593   1168129548   5201.1      0.1                              node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a)
    396                                                                       ## Affinity distances update
    397    224593   1274631354   5675.3      0.1                              if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
    398     32108    551523249  17177.1      0.0                                  node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
    399    224593   1165878108   5191.1      0.1                              node_b_changed = True
    400                                                                       
    401    809945   4449080808   5493.1      0.2                      except KeyError:
    402                                                                   # b can't already get to c (node_c not in node_b_distances)
    403    809945   4208032422   5195.5      0.2                          if((neighbour_distance_b_a + distance_a_c) < cutoff_distance): # not too for to go
    404                                                                       
    405                                                                       # These lines of code copied, for efficiency 
    406                                                                       #  (most of the time, the 'try' block succeeds, so don't bother testing for (node_c in node_b_distances))
    407                                                                       # Found a route
    408    587726   3162939543   5381.7      0.1                              node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a)
    409                                                                       ## Affinity distances update
    410    587726   3363869061   5723.5      0.1                              if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
    411     71659   1258910784  17568.1      0.1                                  node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
    412    587726   2706161481   4604.5      0.1                              node_b_changed = True
    413                                                                   
    414                                                               
    415                                                       
    416                                                       # If any of node b's rows have exceeded the cutoff distance, then remove them
    417  47267073 239847142446   5074.3     10.5              for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary
    418  46716551 242694352980   5195.0     10.6                  if(distance_b_c > cutoff_distance):
    419    200755    967443975   4819.0      0.0                      del node_b_distances[node_c]
    420    200755    930470616   4634.9      0.0                      node_b_changed = True
    421                                                               
    422                                                               ## Affinity distances update
    423    200755   4717125063  23496.9      0.2                      node_b_chemical.del_affinityDistance(node_b, node_c)
    424                                                       
    425                                                       # If we've modified node_b's distance table, tell its chemical to update accordingly
    426    550522   2684634615   4876.5      0.1              if(node_b_changed):
    427    235034   1383213780   5885.2      0.1                  node_b_chemical.nodes_changed.add(node_b)
    428                                                   
    429                                                   # Remove any neighbours that have infinite distance (have just unbound)
    430                                                   ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours)
    431                                                   ##       but doing it above seems to break the walker's movement
    432    791349   4367879451   5519.5      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary
    433    550522   2968919613   5392.9      0.1              if(neighbour_distance_b_a > cutoff_distance):
    434       148       775638   5240.8      0.0                  del self.neighbours[node_a][node_b]
    435                                                           
    436                                                           ## Affinity distances update
    437       148      2096343  14164.5      0.0                  self.del_affinityDistance(node_a, node_b)
    

提交回复
热议问题