Time complexity of a program which involves multiple variables

假如想象 提交于 2019-12-02 04:20:46

问题


I was recently asked to create a program to find best matches in text fragment. I have successfully written this program but I do have a question about its time complexity.

Problem is defined as following.

given a query, find occurrences of the query words in document and highlight the best tokens.

The time that my program takes

O(m + n + p)

here

m = length of the document in characters

n = length of the query in characters

p = number of total matches in the document

In this case the biggest term is always going to be "m" because in most cases documents are going to be larger then the query itself.

Can I safely deduce that time complexity of my program is O(m)?


回答1:


No, you can't. According to the Big-O notation your function m is an upper bound on the actual time your algorithm takes to run, if there's a constant M such as the real time will always be less or equals to M*m. Take a case where the document has size zero (an empty document) but someone queries it with a positive number of characters. The upper bound in this case will be 0 (plus a constant), but the actual time the program will take to run might be greater than that. So your program can not be said to be O(m).

In other words, "most cases" isn't enough: you must prove that your algorithm will perform within that upper bound in all cases.

Update: The same can be said for p: common sense says p is always smaller than m, but that's only true if the search terms don't overlap. Take for instance the document aaaaaa (m=6) and the search terms a, aa and aaa (n=3). In this case, there are 6 occurences of a, 5 of aa and 4 of aaa, so p = 15. Even though it's a very unlikely scenario (same for the empty document) it's still required that you take p into account in your complexity analysis. So your program must really be described as O(m + n + p) as you originally stated.




回答2:


The time that my program takes: O(m + n + p) First off, I totally don't believe that is the time your program takes.

You are asked to parse a query and find the words in a document. This is a complex cross reference problem because you have characters in multiple words that have to match in exact character sequence with the same sequence randomly placed in the document. Most students make a hash of this and create an N squared process by taking the first word and scanning the document for the occurrences of that word and then doing the same thing with the next and next and next. You need to develop an effective means of cross-referencing the contents of the document and the words or you will create an N^2 process. Offhand, create a dictionary of words in the query, parse the document into words and match them against the dictionary of words to find. That would be mLogn

m = number of words the document
n = number of words in the dictionary you create in an nLogn process.

You were mentioned in an article I wrote because it solves a similar but much more complex word matching problem:

http://www.codeproject.com/Tips/882998/Performance-Solving-WonderWord-Puzzle

Your first respondent was correct while making an assumption that I didn't that you had to find the characters without using breaks, but his O notation, I believe is wrong because they are multiplied, not added together and p is irrelevant.



来源:https://stackoverflow.com/questions/10923764/time-complexity-of-a-program-which-involves-multiple-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!