How can I read only the header column of a CSV file using Python?

后端未结

关注

 9  684

生来不讨喜

I am looking for a a way to read just the header row of a large number of large CSV files.

Using Pandas, I have this method available, for each csv file:

相关标签:

9条回答

我在风中等你

2020-12-14 10:19

I've used iglob as an example to search for the .csv files, but one way is to use a set, then adjust as necessary, eg:

import csv
from glob import iglob

unique_headers = set()
for filename in iglob('*.csv'):
    with open(filename, 'rb') as fin:
        csvin = csv.reader(fin)
        unique_headers.update(next(csvin, []))

0 讨论(0)

挽巷

2020-12-14 10:26
it depends on what the header will be used for, if you needed the headers for comparison purposes only (my case) this code will be simple and super fast, it will read the whole header as one string. you can transform all the collected strings together according to your needs:
```
for filename in glob.glob(files_path+"\*.csv"):
    with open(filename) as f:
        first_line = f.readline()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-12-14 10:29
Expanding on the answer given by Jeff It is now possbile to use pandas without actually reading any rows.
```
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.DataFrame(np.random.randn(10, 4), columns=list('abcd')).to_csv('test.csv', mode='w')

In [4]: pd.read_csv('test.csv', index_col=0, nrows=0).columns.tolist()
Out[4]: ['a', 'b', 'c', 'd']
```
pandas can have the advantage that it deals more gracefully with CSV encodings.
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-14 10:30
I might be a little late to the party but here's one way to do it using just the Python standard library. When dealing with text data, I prefer to use Python 3 because unicode. So this is very close to your original suggestion except I'm only reading in one row rather than the whole file.
```
import csv    

with open(fpath, 'r') as infile:
    reader = csv.DictReader(infile)
    fieldnames = reader.fieldnames
```
Hopefully that helps!
0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-14 10:36
What about:
```
pandas.read_csv(PATH_TO_CSV, nrows=1).columns
```
That'll read the first row only and return the columns found.
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-14 10:39
you have missed nrows=1 param to read_csv
```
>>> df= pd.read_csv(PATH_TO_CSV, nrows=1)
>>> df.columns
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页