limit how much data is read with numpy.genfromtxt for matplotlib

前端 未结 2 1811
隐瞒了意图╮
隐瞒了意图╮ 2020-12-11 02:22

I am creating a graph in python using a text file for the source data and matplotlib to plot the graph. The simple logic below works well.

But is there a way to get

相关标签:
2条回答
  • 2020-12-11 02:37

    No idea about numpy, but one possible solution would be to use the stringio class.

    That allows you to just load the data you actually need into a string with normal file IO (there's also a byte version), create a file-like object from the string and pass that to numpy.

    0 讨论(0)
  • 2020-12-11 02:38

    numpy.genfromtxt accepts iterators as well as files. That means it will accept the output of itertools.islice. Here, test.txt is a five-line file:

    >>> import itertools, numpy
    >>> with open('test.txt') as t_in:
    ...     numpy.genfromtxt(itertools.islice(t_in, 3))
    ... 
    array([[  1.,   2.,   3.,   4.,   5.],
           [  6.,   7.,   8.,   9.,  10.],
           [ 11.,  12.,  13.,  14.,  15.]])
    

    One might think this would be slower than letting numpy handle the file IO, but a quick test suggests otherwise. genfromtxt provides a skip_footer keyword argument that you can use if you know how long the file is...

    >>> numpy.genfromtxt('test.txt', skip_footer=2)
    array([[  1.,   2.,   3.,   4.,   5.],
           [  6.,   7.,   8.,   9.,  10.],
           [ 11.,  12.,  13.,  14.,  15.]])
    

    ...but a few informal tests on a 1000-line file suggest that using islice is faster even if you skip only a few lines:

    >>> def get(nlines, islice=itertools.islice):
    ...     with open('test.txt') as t_in:
    ...         numpy.genfromtxt(islice(t_in, nlines))
    ...         
    >>> %timeit get(3)
    1000 loops, best of 3: 338 us per loop
    >>> %timeit numpy.genfromtxt('test.txt', skip_footer=997)
    100 loops, best of 3: 4.92 ms per loop
    >>> %timeit get(300)
    100 loops, best of 3: 5.04 ms per loop
    >>> %timeit numpy.genfromtxt('test.txt', skip_footer=700)
    100 loops, best of 3: 8.48 ms per loop
    >>> %timeit get(999)
    100 loops, best of 3: 16.2 ms per loop
    >>> %timeit numpy.genfromtxt('test.txt', skip_footer=1)
    100 loops, best of 3: 16.7 ms per loop
    
    0 讨论(0)
提交回复
热议问题