Most pythonic way to interleave two strings

前端 未结 14 1042
暗喜
暗喜 2020-11-27 14:34

What\'s the most pythonic way to mesh two strings together?

For example:

Input:

u = \'ABCDEFGHIJKLMNOPQRSTUVWXYZ\'
l = \'abcdefghijklmnopqrst         


        
相关标签:
14条回答
  • 2020-11-27 15:23

    Potentially faster and shorter than the current leading solution:

    from itertools import chain
    
    u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    l = 'abcdefghijklmnopqrstuvwxyz'
    
    res = "".join(chain(*zip(u, l)))
    

    Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!

    Other solutions I came up with along the way:

    res = "".join(u[x] + l[x] for x in range(len(u)))
    
    res = "".join(k + l[i] for i, k in enumerate(u))
    
    0 讨论(0)
  • 2020-11-27 15:26

    On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is

    res = bytearray(len(u) * 2)
    res[::2] = u
    res[1::2] = l
    str(res)
    

    This wouldn't work on Python 3, though. You could implement something like

    res = bytearray(len(u) * 2)
    res[::2] = u.encode("ascii")
    res[1::2] = l.encode("ascii")
    res.decode("ascii")
    

    but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.

    FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:

    res = bytearray(len(u) * 4 * 2)
    
    u_utf32 = u.encode("utf_32_be")
    res[0::8] = u_utf32[0::4]
    res[1::8] = u_utf32[1::4]
    res[2::8] = u_utf32[2::4]
    res[3::8] = u_utf32[3::4]
    
    l_utf32 = l.encode("utf_32_be")
    res[4::8] = l_utf32[0::4]
    res[5::8] = l_utf32[1::4]
    res[6::8] = l_utf32[2::4]
    res[7::8] = l_utf32[3::4]
    
    res.decode("utf_32_be")
    

    Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.

    Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.

    0 讨论(0)
提交回复
热议问题