What is this cProfile result telling me I need to fix?

前端 未结 3 1927
刺人心
刺人心 2021-02-01 07:59

I would like to improve the performance of a Python script and have been using cProfile to generate a performance report:

python -m cProfile -o chrX         


        
相关标签:
3条回答
  • 2021-02-01 08:31

    This output is going to be more useful if your code is more modular as Lie Ryan has stated. However, a couple of things you can pick up from the output and just looking at the source code:

    You're doing a lot of comparisons that aren't actually necessary in Python. For example, instead of:

    if len(entryText) > 0:

    You can just write:

    if entryText:

    An empty list evaluates to False in Python. Same is true for an empty string, which you also test for in your code, and changing it would also make the code a bit shorter and more readable, so instead of this:

       for line in metadataLines:      
            if line == '':
                break
            else:
                metadataList.append(line)
    

    You can just do:

    for line in metadataLines:
        if line:
           metadataList.append(line)
    

    There are several other issues with this code in terms of both organization and performance. You assign variables multiple times to the same thing instead of just creating an object instance once and doing all accesses on the object, for example. Doing this would reduce the number of assignments, and also the number of global variables. I don't want to sound overly critical, but this code doesn't appear to be written with performance in mind.

    0 讨论(0)
  • 2021-02-01 08:35

    ncalls is relevant only to the extent that comparing the numbers against other counts such as number of chars/fields/lines in a file may highligh anomalies; tottime and cumtime is what really matters. cumtime is the time spent in the function/method including the time spent in the functions/methods that it calls; tottime is the time spent in the function/method excluding the time spent in the functions/methods that it calls.

    I find it helpful to sort the stats on tottime and again on cumtime, not on name.

    bgchar definitely refers to the execution of the script and is not irrelevant as it takes up 8.9 seconds out of 13.5; that 8.9 seconds does NOT include time in the functions/methods that it calls! Read carefully what @Lie Ryan says about modularising your script into functions, and implement his advice. Likewise what @jonesy says.

    string is mentioned because you import string and use it in only one place: string.find(elements[0], 'p'). On another line in the output you'll notice that string.find was called only once, so it's not a performance problem in this run of this script. HOWEVER: You use str methods everywhere else. string functions are deprecated nowadays and are implemented by calling the corresponding str method. You would be better writing elements[0].find('p') == 0 for an exact but faster equivalent, and might like to use elements[0].startswith('p') which would save readers wondering whether that == 0 should actually be == -1.

    The four methods mentioned by @Bernd Petersohn take up only 3.7 seconds out of a total execution time of 13.541 seconds. Before worrying too much about those, modularise your script into functions, run cProfile again, and sort the stats by tottime.

    Update after question revised with changed script:

    """Question: What can I do about join, split and write operations to reduce the apparent impact they have on the performance of this script?""

    Huh? Those 3 together take 2.6 seconds out of the total of 13.8. Your parseJarchLine function is taking 8.5 seconds (which doesn't include time taken by functions/methods that it calls. assert(8.5 > 2.6)

    Bernd has already pointed you at what you might consider doing with those. You are needlessly splitting the line completely only to join it up again when writing it out. You need to inspect only the first element. Instead of elements = line.split('\t') do elements = line.split('\t', 1) and replace '\t'.join(elements[1:]) by elements[1].

    Now let's dive into the body of parseJarchLine. The number of uses in the source and manner of the uses of the long built-in function are astonishing. Also astonishing is the fact that long is not mentioned in the cProfile output.

    Why do you need long at all? Files over 2 Gb? OK, then you need to consider that since Python 2.2, int overflow causes promotion to long instead of raising an exception. You can take advantage of faster execution of int arithmetic. You also need to consider that doing long(x) when x is already demonstrably a long is a waste of resources.

    Here is the parseJarchLine function with removing-waste changes marked [1] and changing-to-int changes marked [2]. Good idea: make changes in small steps, re-test, re-profile.

    def parseJarchLine(chromosome, line):
        global pLength
        global lastEnd
        elements = line.split('\t')
        if len(elements) > 1:
            if lastEnd != "":
                start = long(lastEnd) + long(elements[0])
                # [1] start = lastEnd + long(elements[0])
                # [2] start = lastEnd + int(elements[0])
                lastEnd = long(start + pLength)
                # [1] lastEnd = start + pLength
                sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:])))
            else:
                lastEnd = long(elements[0]) + long(pLength)
                # [1] lastEnd = long(elements[0]) + pLength
                # [2] lastEnd = int(elements[0]) + pLength
                sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, long(elements[0]), lastEnd, '\t'.join(elements[1:])))
        else:
            if elements[0].startswith('p'):
                pLength = long(elements[0][1:])
                # [2] pLength = int(elements[0][1:])
            else:
                start = long(long(lastEnd) + long(elements[0]))
                # [1] start = lastEnd + long(elements[0])
                # [2] start = lastEnd + int(elements[0])
                lastEnd = long(start + pLength)
                # [1] lastEnd = start + pLength
                sys.stdout.write("%s\t%ld\t%ld\n" % (chromosome, start, lastEnd))               
        return
    

    Update after question about sys.stdout.write

    If the statement that you commented out was anything like the original one:

    sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:])))
    

    Then your question is ... interesting. Try this:

    payload = "%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:]))
    sys.stdout.write(payload)
    

    Now comment out the sys.stdout.write statement ...

    By the way, someone mentioned in a comment about breaking this into more than one write ... have you considered this? How many bytes on average in elements[1:] ? In chromosome?

    === change of topic: It worries me that you initialise lastEnd to "" rather than to zero, and that nobody has commented on it. Any way, you should fix this, which allows a rather drastic simplification plus adding in others' suggestions:

    def parseJarchLine(chromosome, line):
        global pLength
        global lastEnd
        elements = line.split('\t', 1)
        if elements[0][0] == 'p':
            pLength = int(elements[0][1:])
            return
        start = lastEnd + int(elements[0])
        lastEnd = start + pLength
        sys.stdout.write("%s\t%ld\t%ld" % (chromosome, start, lastEnd))
        if elements[1:]:
            sys.stdout.write(elements[1])
        sys.stdout.write(\n)
    

    Now I'm similarly worried about the two global variables lastEnd and pLength -- the parseJarchLine function is now so small that it can be folded back into the body of its sole caller, extractData, which saves two global variables, and a gazillion function calls. You could also save a gazillion lookups of sys.stdout.write by putting write = sys.stdout.write once up the front of extractData and using that instead.

    BTW, the script tests for Python 2.5 or better; have you tried profiling on 2.5 and 2.6?

    0 讨论(0)
  • 2021-02-01 08:49

    The entries relevant for possible optimization are those with high values for ncalls and tottime. bgchr:4(<module>) and <string>:1(<module>) probably refer to the execution of your module body and are not relevant here.

    Obviously, your performance problem comes from string processing. This should perhaps be reduced. The hot spots are split, join and sys.stdout.write. bz2.decompress also seems to be costly.

    I suggest you try the following:

    • Your main data seems to consist of tab separated CSV values. Try out, if CSV reader performs better.
    • sys.stdout is line buffered and flushed each time a newline is written. Consider writing to a file with a larger buffer size.
    • Instead of joining elements before writing them out, write them sequentially to the output file. You may also consider using CSV writer.
    • Instead of decompressing the data at once into a single string, use a BZ2File object and pass that to the CSV reader.

    It seems that the loop body that actually uncompresses data is only invoked once. Perhaps you find a way to avoid the call dataHandle.read(size), which produces a huge string that is then decompressed, and to work with the file object directly.

    Addendum: BZ2File is probably not applicable in your case, because it requires a filename argument. What you need is something like a file object view with integrated read limit, comparable to ZipExtFile but using BZ2Decompressor for decompression.

    My main point here is that your code should be changed to perform a more iterative processing of your data instead of slurping it in as a whole and splitting it again afterwards.

    0 讨论(0)
提交回复
热议问题