I have a file.csv
with ~15k rows that looks like this
SAMPLE_TIME, POS, OFF, HISTOGRAM
2015-07-15 16:41:56, 0-0-0-0-3, 1,
You can create columns based on the length of the first actual row:
from tempfile import TemporaryFile
with open("out.txt") as f, TemporaryFile("w+") as t:
h, ln = next(f), len(next(f).split(","))
header = h.strip().split(",")
f.seek(0), next(f)
header += range(ln)
print(pd.read_csv(f, names=header))
Which will give you:
SAMPLE_TIME POS OFF HISTOGRAM 0 1 2 3 \
0 2015-07-15 16:41:56 0-0-0-0-3 1 2 0 5 59 0
1 2015-07-15 16:42:55 0-0-0-0-3 1 0 0 5 9 0
2 2015-07-15 16:43:55 0-0-0-0-3 1 0 0 5 5 0
3 2015-07-15 16:44:56 0-0-0-0-3 1 2 0 5 0 0
4 5 ... 13 14 15 16 17 18 19 20 21 22
0 0 0 ... 0 0 0 0 0 NaN NaN NaN NaN NaN
1 0 0 ... 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 0 0 ... 4 0 0 0 NaN NaN NaN NaN NaN NaN
3 0 0 ... 0 0 0 0 NaN NaN NaN NaN NaN NaN
[4 rows x 27 columns]
Or you could clean the file before passing to pandas:
import pandas as pd
from tempfile import TemporaryFile
with open("in.csv") as f, TemporaryFile("w+") as t:
for line in f:
t.write(line.replace(" ", ""))
t.seek(0)
ln = len(line.strip().split(","))
header = t.readline().strip().split(",")
header += range(ln)
print(pd.read_csv(t,names=header))
Which gives you:
SAMPLE_TIME POS OFF HISTOGRAM 0 1 2 3 4 5 ... 11 \
0 2015-07-1516:41:56 0-0-0-0-3 1 2 0 5 59 0 0 0 ... 0
1 2015-07-1516:42:55 0-0-0-0-3 1 0 0 5 9 0 0 0 ... 0
2 2015-07-1516:43:55 0-0-0-0-3 1 0 0 5 5 0 0 0 ... 0
3 2015-07-1516:44:56 0-0-0-0-3 1 2 0 5 0 0 0 0 ... 0
12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 NaN NaN NaN
1 50 0 NaN NaN NaN NaN NaN NaN NaN
2 0 4 0 0 0 NaN NaN NaN NaN
3 6 0 0 0 0 NaN NaN NaN NaN
[4 rows x 25 columns]
or to drop the columns will all nana:
print(pd.read_csv(f, names=header).dropna(axis=1,how="all"))
Gives you:
SAMPLE_TIME POS OFF HISTOGRAM 0 1 2 3 \
0 2015-07-15 16:41:56 0-0-0-0-3 1 2 0 5 59 0
1 2015-07-15 16:42:55 0-0-0-0-3 1 0 0 5 9 0
2 2015-07-15 16:43:55 0-0-0-0-3 1 0 0 5 5 0
3 2015-07-15 16:44:56 0-0-0-0-3 1 2 0 5 0 0
4 5 ... 8 9 10 11 12 13 14 15 16 17
0 0 0 ... 2 0 0 0 0 0 0 0 0 0
1 0 0 ... 2 0 0 0 50 0 NaN NaN NaN NaN
2 0 0 ... 2 0 0 0 0 4 0 0 0 NaN
3 0 0 ... 2 0 0 0 6 0 0 0 0 NaN
[4 rows x 22 columns]