Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError

前端未结

关注

 4  1488

[愿得一人] 2021-01-04 17:37

I am using pandas 0.12.0 in ipython3 on Ubuntu 13.10, in order to wrangle large tab-delimited datasets in txt files. Using read_table to create a DataFrame from the txt app

4条回答

囚心锁ツ (楼主)

2021-01-04 18:32
I also stumbled upon similar problem. When I was reading as df = pandas.read_csv(csvfile, sep), the first column had this strange format in name:
```
df.columns[0]
```
returned this result:
```
'\xef\xbb\xbfColName'
```
When I tried selecting this column, I got an error:
```
df.ColName
```
returned
```
AttributeError: 'DataFrame' object has no attribute 'ColName'
```
After reading this I just used my external program Sublime to change the encoding and save the file as a new file (save with encoding UTF-8, but without BOM).

Afterwards pandas reads the first column name correctly and I am able to select it withdf.ColName and it returns correct value. Such a small thing that took 45 minutes to solve.

TLDR: Save file with encoding without BOM.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...