Efficient way to convert delimiter separated string to numpy array

前端 未结 3 1806
独厮守ぢ
独厮守ぢ 2020-12-31 05:29

I have a String as follows :

1|234|4456|789

I have to convert it into numpy array.I would like to know the most efficient way.Since I will

相关标签:
3条回答
  • 2020-12-31 05:55

    Try this:

    import numpy as np
    s = '1|234|4456|789'
    array = np.array([int(x) for x in s.split('|')])
    

    ... Assuming that the numbers are all ints. if not, replace int with float in the above snippet of code.

    EDIT 1:

    Alternatively, you can do this, it will only create one intermediate list (the one generated by split()):

    array = np.array(s.split('|'), dtype=int)
    

    EDIT 2:

    And yet another way, possibly faster (thanks for all the comments, guys!):

    array = np.fromiter(s.split("|"), dtype=int)
    
    0 讨论(0)
  • 2020-12-31 06:01

    @jterrace wins one (1) internet.

    In the measurements below the example code has been shortened to allow the tests to fit on one line without scrolling where possible.

    For those not familiar with timeit the -s flag allows you to specify a bit of code which will only be executed once.


    The fastest and least-cluttered way is to use numpy.fromstring as jterrace suggested:

    python -mtimeit -s"import numpy;s='1|2'" "numpy.fromstring(s,dtype=int,sep='|')"
    100000 loops, best of 3: 1.85 usec per loop
    

    The following three examples use string.split in combination with another tool.

    string.split with numpy.fromiter

    python -mtimeit -s"import numpy;s='1|2'" "numpy.fromiter(s.split('|'),dtype=int)"
    100000 loops, best of 3: 2.24 usec per loop
    

    string.split with int() cast via generator-expression

    python -mtimeit -s"import numpy;s='1|2'" "numpy.array(int(x) for x in s.split('|'))"
    100000 loops, best of 3: 3.12 usec per loop
    

    string.split with NumPy array of type int

    python -mtimeit -s"import numpy;s='1|2'" "numpy.array(s.split('|'),dtype=int)"
    100000 loops, best of 3: 9.22 usec per loop
    
    0 讨论(0)
  • 2020-12-31 06:05

    The fastest way is to use the numpy.fromstring method:

    >>> import numpy
    >>> data = "1|234|4456|789"
    >>> numpy.fromstring(data, dtype=int, sep="|")
    array([   1,  234, 4456,  789])
    
    0 讨论(0)
提交回复
热议问题