I was wondering about my apparently code for an MIT online edx class practice problem vs the provided answer code.
The assignment in question from the class was
Their answer code is more efficient because it does not repeatedly iterate through subsequences. Given the subsequence ABCDE
, your code separately processes BCDE
, CDE
, and DE
in successive iterations even though they cannot be longest.
Therefore, the worst-case runtime of your answer is O(N^2) vs. O(N) for theirs. Yes, this is related to having a nested for
loop which is not present in their answer.
As a rough rule, adding more loops slows things down. But that's just a rough rule. Things that don't look like loops can, in actual implementation, be loops and thus slow things down. For example, innocent looking code like curString += s[i]
can actually be quite slow. That's because, assuming curString is a Python string, you can't just add one more letter to it; what Python ends up doing is creating a new string that's 1 character longer than the old one, then copying all the old characters into the new string, then appending the one new character, and then assigning this new string to curString
. Neither implementation is terribly efficient as they both do things like this (using range instead of xrange, copying slices of strings, etc.). However, assuming the strings are relatively short this is also unlikely to matter.
In any event, both implementations, your and theirs, could be fixed to so that each operation they perform is efficient. In that case, it does come back to the loops and their implementation is indeed faster than yours. To see why, consider a string like "wxyabcd". When considering the first three characters (the "w", "x", and "y"), both algorithms do pretty much the same thing. But consider what happens next. In your code you'll encounter the "a", note that this isn't in alphabetical order, so you end you inner loop. Your outer loop will have b = 1, and you'll consider the all strings that start with "x". However, these won't ever give you a longer string that the one that started with "w", so this is wasted effort. Still you'll end up checking "x", "xy", and "y" before moving on to check the strings that start with "a", while the MIT code will jump right to the strings that start with "a". To be more concrete, here's the set of strings your code will consider:
w
wx
wxy
x
xy
y
a
ab
abc
abcd
b
bc
bcd
c
cd
d
And here's what the MIT code will consider
w
wx
wxy
a
ab
abc
abcd
As you can see, their code does a lot less work. One way to look at it is that they "look at" any given character in the string only once while you will look at some characters multiple times.
or, in one pass and without the overhead of string concatenation:
length, start, stop, i = len(s), 0, 0, 0
while i < length:
j = i+1
while j < length and s[j] >= s[j-1]:
j += 1
if j - i > stop - start:
start, stop = i, j
i = j
print(s[start:stop])