I was recently asked to create a program to find best matches in text fragment. I have successfully written this program but I do have a question about its time complexity.
The time that my program takes: O(m + n + p) First off, I totally don't believe that is the time your program takes.
You are asked to parse a query and find the words in a document. This is a complex cross reference problem because you have characters in multiple words that have to match in exact character sequence with the same sequence randomly placed in the document. Most students make a hash of this and create an N squared process by taking the first word and scanning the document for the occurrences of that word and then doing the same thing with the next and next and next. You need to develop an effective means of cross-referencing the contents of the document and the words or you will create an N^2 process. Offhand, create a dictionary of words in the query, parse the document into words and match them against the dictionary of words to find. That would be mLogn
m = number of words the document
n = number of words in the dictionary you create in an nLogn process.
You were mentioned in an article I wrote because it solves a similar but much more complex word matching problem:
http://www.codeproject.com/Tips/882998/Performance-Solving-WonderWord-Puzzle
Your first respondent was correct while making an assumption that I didn't that you had to find the characters without using breaks, but his O notation, I believe is wrong because they are multiplied, not added together and p is irrelevant.
No, you can't. According to the Big-O notation your function m
is an upper bound on the actual time your algorithm takes to run, if there's a constant M
such as the real time will always be less or equals to M*m
. Take a case where the document has size zero (an empty document) but someone queries it with a positive number of characters. The upper bound in this case will be 0
(plus a constant), but the actual time the program will take to run might be greater than that. So your program can not be said to be O(m)
.
In other words, "most cases" isn't enough: you must prove that your algorithm will perform within that upper bound in all cases.
Update: The same can be said for p
: common sense says p
is always smaller than m
, but that's only true if the search terms don't overlap. Take for instance the document aaaaaa
(m=6) and the search terms a
, aa
and aaa
(n=3). In this case, there are 6 occurences of a
, 5 of aa
and 4 of aaa
, so p = 15
. Even though it's a very unlikely scenario (same for the empty document) it's still required that you take p
into account in your complexity analysis. So your program must really be described as O(m + n + p)
as you originally stated.