I have a very long file and I only need parts, a slice, of it. There is new data coming in so the file will potentially get longer.
To load the data from the CSV I u
Following this example, you should be able to use itertools.islice
, without needing imap
, map
or csv.reader
:
import numpy as np
import itertools
with open('sample.txt') as f:
# this will skip 100 lines, then read the next 50
d=np.genfromtxt(itertools.islice(f,100,150),delimiter=',',usecols={cols})
Starting Numpy 1.10
, np.genfromtxt takes an optional parameter max_rows
which limits the number of lines to read.
Combined with the other optional parameter skip_header
, you can select a slice of your file (for instance lines 100 to 150):
import numpy as np
np.loadtxt('file.txt', skip_header=100, max_rows=50)
You could get the slice using itertools, taking the column using itemgetter:
import numpy as np
from operator import itemgetter
import csv
with open(filename) as f:
from itertools import islice,imap
r = csv.reader(f)
np.genfromtxt(imap(itemgetter(1),islice(r, start, end+1)))
For python3, you can use fromiter
with the code above you need to specify the dtype:
import numpy as np
from operator import itemgetter
import csv
with open("sample.txt") as f:
from itertools import islice
r = csv.reader(f)
print(np.fromiter(map(itemgetter(0), islice(r, start, end+1)), dtype=float))
As in the other answer you can also pass the islice object directly to genfromtxt but for python3 you will need to open the file in binary mode:
with open("sample.txt", "rb") as f:
from itertools import islice
print(np.genfromtxt(islice(f, start, end+1), delimiter=",", usecols=cols))
Interestingly, for multiple columns using itertools.chain and reshaping is over twice as efficient if all your dtypes are the same:
from itertools import islice,chain
with open("sample.txt") as f:
r = csv.reader(f)
arr =np.fromiter(chain.from_iterable(map(itemgetter(0, 4, 10),
islice(r, 4, 10))), dtype=float).reshape(6, -1)
On you sample file:
In [27]: %%timeit
with open("sample.txt", "rb") as f:
(np.genfromtxt(islice(f, 4, 10), delimiter=",", usecols=(0, 4, 10),dtype=float))
....:
10000 loops, best of 3: 179 µs per loop
In [28]: %%timeit
with open("sample.txt") as f:
r = csv.reader(f) (np.fromiter(chain.from_iterable(map(itemgetter(0, 4, 10), islice(r, 4, 10))), dtype=float).reshape(6, -1))
10000 loops, best of 3: 86 µs per loop