问题
I am trying to get an index or row number for the row that holds the headers in my CSV file. The issue is, the header row can move up and down depending on the output of the report from our system (I have no control to change this)
code:
ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files
If I was to print test, I would expect a number around 4 or 5, and that's what I am feeding into the second read "read_csv"
The error I'm getting is that it's expecting 1 header column, but I have 26 columns. I am just trying to use the first header string to get the row number
Thanks :-)
Edit:
CSV format
This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18
as you will see the "The deadlines" rows are the same, this can be 3 or 5 based on the code ids, thus the header row can change up or down.
I also did not write out all 26 column headers, not sure that matters.
Wanted DF format
index | code | type | arrived_date | est_del_date
1 | a/wrwgwr12/001 | kids | 12-dec-18 | 17-dec-18
2 | aa/gjghgj35/030 | Pet | 15-dec-18 | 18-dec-18
Hope this makes sense..
Thanks,
回答1:
You can use the csv
module to find the first row which contains a delimiter, then feed the index of this row as the skiprows
parameter to pd.read_csv:
from io import StringIO
import csv
import pandas as pd
x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
reader = csv.reader(fin)
idx = next(idx for idx, row in enumerate(reader) if len(row) > 1) # 4
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)
print(df)
code type arrived_date est_del_date
0 a/wrwgwr12/001 kids 12-dec-18 17-dec-18
1 aa/gjghgj35/030 pet 15-dec-18 18-dec-18
来源:https://stackoverflow.com/questions/53835607/finding-the-row-number-for-the-header-row-in-a-csv-file-pandas-dataframe