问题
For example:
'hello'.count('e')
Is this O(n)? I'm guessing the way it works is it scans 'hello'
and increments a counter each time the letter 'e'
is seen. How can I know this without guessing? I tried reading the source code here, but got stuck upon finding this:
def count(s, *args):
"""count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
"""
return s.count(*args)
Where can I read about what's executed in s.count(*args)
?
edit: I understand what *args
does in the context of Python functions.
回答1:
str.count
is implemented in native code, in the stringobject.c file, which delegates to either stringlib_count, or PyUnicode_Count which itself delegates to stringlib_count
again. stringlib_count
ultimately uses fastsearch to search for occurrences of the substring in the string and counting those.
For one-character strings (e.g. your 'e'
), it is short-circuited to the following code path:
for (i = 0; i < n; i++)
if (s[i] == p[0]) {
count++;
if (count == maxcount)
return maxcount;
}
return count;
So yes, this is exactly as you assumed a simple iteration over the string sequence and counting the occurences of the substring.
For search strings longer than a single character it gets a bit more complicated, due to handling overlaps etc., and the logic is buried deeper in the fastsearch
implementation. But it’s essentially the same: a linear search through the string.
So yes, str.count
is in linear time, O(n). And if you think about it, it makes a lot of sense: In order to know how often a substring appears in a string, you need to look at every possible substring of the same length. So for a substring length of 1, you have to look at every character in the string, giving you a linear complexity.
Btw. for more information about the underlying fastsearch algorithm, see this article on effbot.org.
For Python 3, which only has a single Unicode string type, the links to the implementations are: unicode_count which uses stringlib_count which uses fastsearch.
回答2:
Much of python's library code is written in C. The code you are looking for is here:
http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup
static PyMethodDef
string_methods[] = {
// ...
{"count", (PyCFunction)string_count, METH_VARARGS, count__doc__},
// ...
{NULL, NULL} /* sentinel */
};
static PyObject *
string_count(PyStringObject *self, PyObject *args) {
...
}
回答3:
If you pursue @AJNeufeld's answer a little ways, you will eventually come upon this link, which explains how the (then-)new find logic works. It's a combination of several string searching approaches, with the intent of benefiting from some of the logic, but avoiding the up-front table setup costs for searches: http://effbot.org/zone/stringlib.htm
Boyer-Moore is a famous string searching algorithm. BM-Horspool and BM-Sunday are variants that improve on the original in certain ways. Google will find you more than you ever wanted to know about these.
来源:https://stackoverflow.com/questions/35855748/whats-the-computational-cost-of-count-operation-on-strings-python