smartest way to join two lists into a formatted string

前端 未结 4 996
忘了有多久
忘了有多久 2020-12-03 02:27

Lets say I have two lists of same length:

a = [\'a1\', \'a2\', \'a3\']
b = [\'b1\', \'b2\', \'b3\']

and I want to produce the following str

相关标签:
4条回答
  • 2020-12-03 02:31
    a = ['a1', 'a2', 'a3']
    b = ['b1', 'b2', 'b3']
    
    pat = '%s=%%s, %s=%%s, %s=%%s'
    
    print pat % tuple(a) % tuple(b)
    

    gives a1=b1, a2=b2, a3=b3

    .

    Then:

    from timeit import Timer
    from itertools import izip
    
    n = 300
    
    a = [str(f) for f in range(n)]
    b = [str(f) for f in range(n)]
    
    def func1():
        return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])
    
    def func2():
        list = []
        for i in range(len(a)):
            list.append('%s=%s' % (a[i], b[i]))
        return ', '.join(list)
    
    def func3():
        return ', '.join('%s=%s' % t for t in zip(a, b))
    
    def func4():
        return ', '.join('%s=%s' % t for t in izip(a, b))
    
    def func5():
        pat = n * '%s=%%s, '
        return pat % tuple(a) % tuple(b)
    
    d = dict(zip((1,2,3,4,5),('heavy','append','zip','izip','% formatting')))
    for i in xrange(1,6):
        t = Timer(setup='from __main__ import func%d'%i, stmt='func%d()'%i)
        print 'func%d = %s  %s' % (i,t.timeit(10),d[i])
    

    result

    func1 = 16.2272833558  heavy
    func2 = 0.00410247671143  append
    func3 = 0.00349569568199  zip
    func4 = 0.00301686387516  izip
    func5 = 0.00157338432678  % formatting
    
    0 讨论(0)
  • 2020-12-03 02:35

    This implementation is, on my system, faster than either of your two functions and still more compact.

    c = ', '.join('%s=%s' % t for t in zip(a, b))
    

    Thanks to @JBernardo for the suggested improvement.

    In more recent syntax, str.format is more appropriate:

    c = ', '.join('{}={}'.format(*t) for t in zip(a, b))
    

    This produces the largely the same output, though it can accept any object with a __str__ method, so two lists of integers could still work here.

    0 讨论(0)
  • 2020-12-03 02:40
    >>> ', '.join(i + '=' + j for i,j in zip(a,b))
    'a1=b1, a2=b2, a3=b3'
    
    0 讨论(0)
  • 2020-12-03 02:43

    Those two solutions do very different things. The first loops in a nested way, then computes indexes with list.index, effectively making this a doubly-nested for loop and requiring what you could think of as 125,000,000 operations. The second iterates in lockstep, making 500 pairs without doing 250000 operations. No wonder they're so different!

    Are you familiar with Big O notation for describing the complexity of algorithms? If so, the first solution is cubic and the second solution is linear. The cost of choosing the first one over the second one is going to grow at an alarming rate as a and b get longer, so no one would use an algorithm like that.


    Personally, I would almost certainly use code like

    ', '.join('%s=%s' % pair for pair in itertools.izip(a, b))
    

    or if I wasn't too worried about the size of a and b and just writing quick, I would use zip instead of itertools.izip. This code has several advantages

    • It's linear. Although premature optimization is a huge problem, it's best not to cavalierly use an algorithm with an unnecessarily bad asymptotic performance.

    • It's simple and idiomatic. I see other people write code like this frequently.

    • It's memory efficient. By using a generator expression instead of a list comprehension (and itertools.izip rather than zip), I don't build unnecessary lists in memory and turn what could be an O(n) (linear)-memory operation into an O(1) (constant)-memory operation.


    As for timing to find the fastest solution, this would almost certainly be an example of premature optimization. To write performant programs, we use theory and experience to write high-quality, maintainable, good code. Experience shows it's at best futile and at worst counterproductive to stop at random operations and ask the question, "What is the best way to do this particular operation," and trying to determine it from guessing or even testing.

    In reality, the programs with the best performance are the ones that are written with code of the highest quality and very selective optimizations. High-quality code that values readability and simplicity over microbenchmarks ends up being easier to test, less buggy, and nicer to refactor--these factors are key for effectively optimizing your program. The time you spend fixing unnecessary bugs, understanding complicated code, and fighting with re factoring can be spent optimizing instead.

    When it comes time to optimize a program -- after it's tested and probably documented -- this is not done on random snippets, but on ones determined by actual usecases and/or performance tests, with measurements collected by profiling. If a particular piece of code is only taking 0.1% of the time in the program, no amount of speeding up that piece is going to do any real good.

    0 讨论(0)
提交回复
热议问题