I\'m trying to use the Python profiler to speed up my code. I\'ve been able to identify the specific function where nearly all of the time is spent, but I can\'t figure out
I'll support Fragsworth by saying that you'll want to split up your function into smaller ones.
Having said that, you are reading the output correctly: the tottime is the one to watch.
Now for where your slowdown is likely to be:
Since there seem to be 100000 calls to appendBallot, and there aren't any obvious loops, I'd suggest it is in your assert. Because you are executing:
assert(ballotID not in self.ballotIDs)
This will actually act as a loop. Thus, the first time you call this function, it will iterate through a (probably empty) array, and then assert if the value was found. The 100000th time it will iterate through the entire array.
And there is actually a possible bug here: if a ballot is deleted, then the next ballot added would have the same id as the last added one (unless that were the one deleted). I think you would be better off using a simple counter. That way you can just increment it each time you add a ballot. Alternatively, you could use a UUID to get unique ids.
Alternatively, if you are looking at some level of persistence, use an ORM, and get it to do the ID generation, and unique checking for you.
I've had a look at your code, and it looks like you make a lot of function calls and attribute lookups as part of your 'checking' or looking ahead before leaping. You also have a lot of code dedicated to track the same condition, i.e many bits of code looking at creating 'unique' IDs.
instead of trying to assign some kind of unique string to each ballot, couldn't you just use the ballotID (an integer number?)
now you could have a dictionary (uniqueBallotIDs) mapping ballotID and the actual ballot object.
the process might be something like this:
def appendBallot(self, ballot, ballotID=None):
if ballotID is None:
ballotID = self._getuniqueid() # maybe just has a counter? up to you.
# check to see if we have seen this ballot before.
if not self._isunique(ballotID):
# code for non-unique ballot ids.
else:
# code for unique ballot ids.
self.ballotOrder.append(i)
You might be able to handle some of your worries about the dictionary missing a given key by using a defaultdict (from the collections module). collection docs
Edit for completeness I will include a sample usage of the defaultdict:
>>> from collections import defaultdict
>>> ballotIDmap = defaultdict(list)
>>> ballotID, ballot = 1, object() # some nominal ballotID and object.
>>> # I will now try to save my ballotID.
>>> ballotIDmap[ballotID].append(ballot)
>>> ballotIDmap.items()
[(1, [<object object at 0x009BB950>])]
Profilers can be like that. The method I use is this. It gets right to the heart of the problem in no time.
Yeah I came across that same problem as well.
The only way I know to work around this is to wrap your large function into several smaller function calls. This will allow the profiler to take into account each of the smaller function calls.
Interesting enough, the process of doing this (for me, anyway) made it obvious where the inefficiencies were, so I didn't even have to run the profiler.
I have used this decorator in my code, and it helped me with my pyparsing tuning work.
You have two problems in this little slice of code:
# Assign a ballot ID if one has not been given
if ballotID is None:
ballotID = len(self.ballotIDs)
assert(ballotID not in self.ballotIDs)
self.ballotIDs.append(ballotID)
Firstly it appears that self.ballotIDs is a list, so the assert statement will cause quadratic behaviour. As you didn't give any documentation at all for your data structures, it's not possible to be prescriptive, but if the order of appearance doesn't matter, you could use a set instead of a list.
Secondly, the logic (in the absence of documentation on what a ballotID is all about, and what a not-None ballotID arg means) seems seriously bugged:
obj.appendBallot(ballota, 2) # self.ballotIDs -> [2]
obj.appendBallot(ballotb) # self.ballotIDs -> [2, 1]
obj.appendBallot(ballotc) # wants to add 2 but triggers assertion
Other comments:
Instead of adict.has_key(key)
, use key in adict
-- it's faster and looks better.
You may like to consider reviewing your data structures ... they appear to be slightly baroque; there may be a fair bit of CPU time involved in building them.