Fastest way to generate delimited string from 1d numpy array

前端 未结 7 2185
春和景丽
春和景丽 2020-12-15 16:53

I have a program which needs to turn many large one-dimensional numpy arrays of floats into delimited strings. I am finding this operation quite slow relative to the mathema

相关标签:
7条回答
  • 2020-12-15 17:13

    Convert the numpy array into a list first. The map operation seems to run faster on a list than on a numpy array.

    e.g.

    import numpy as np
    x = np.random.randn(100000).tolist()
    for i in range(100):
        ",".join(map(str, x))
    

    In timing tests I found a consistent 15% speedup for this example

    I'll leave others to explain why this might be faster as I have no idea!

    0 讨论(0)
  • 2020-12-15 17:20

    numpy.savetxt is even slower than string.join. ndarray.tofile() doesn't seem to work with StringIO.

    But I do find a faster method (at least applying to OP's example on python2.5 with lower version of numpy):

    import numpy as np
    x = np.random.randn(100000)
    for i in range(100):
        (",%f"*100000)[1:] % tuple(x)
    

    It looks like string format is faster than string join if you have a well defined format such as in this particular case. But I wonder why OP needs such a long string of floating numbers in memory.

    Newer versions of numpy shows no speed improvement.

    0 讨论(0)
  • 2020-12-15 17:23

    I think you could experiment with numpy.savetxt passing a cStringIO.StringIO object as a fake file...

    Or maybe using str(x) and doing a replacement of the whitespaces by commas (edit: this won't work quite well because the str does an ellipsis of large arrays :-s).

    As the purpose of this was to send the array over the network, maybe there are better alternatives (more efficient both in cpu and bandwidth). The one I pointed out in a comment to other answer as to encode the binary representation of the array as a Base64 text block. The main inconvenient for this to be optimal is that the client reading the chunk of data should be able to do nasty things like reinterpret a byte array as a float array, and that's not usually allowed in type safe languages; but it could be done quickly with a C library call (and most languages provide means to do this).

    In case you cannot mess with bits, there's always the possibility of processing the numbers one by one to convert the decoded bytes to floats.

    Oh, and watch out for the endiannes of the machines when sending data through the network: convert to network order -> base64encode -> send | receive -> base64decode -> convert to host order

    0 讨论(0)
  • 2020-12-15 17:24
    ','.join(x.astype(str))
    

    is about 10% slower than as

    x_arrstr = np.char.mod('%f', x)
    x_str = ",".join(x_arrstr)
    

    but is more readable.

    0 讨论(0)
  • 2020-12-15 17:33

    A little late, but this is faster for me:

    #generate an array with strings
    x_arrstr = np.char.mod('%f', x)
    #combine to a string
    x_str = ",".join(x_arrstr)
    

    Speed up is on my machine about 1.5x

    0 讨论(0)
  • 2020-12-15 17:33

    Using imap from itertools instead of map in the OP's code is giving me about a 2-3% improvement which isn't much, but something that might combine with other ideas to give more improvement.

    Personally, I think that if you want much better than this that you will have to use something like Cython.

    0 讨论(0)
提交回复
热议问题