问题
I had a data set that looks like:
Id Economics English History Literature
0 56 1 1 2 1
1 11 1 0 0 1
2 6 0 1 1 0
3 43 2 0 1 1
4 14 0 1 1 0
I created this dataset by reading some csv from file, I could very easily accessed the columns just with df['Economics'], for example. Then I save it into the file with:
df.to_csv(file_path, sep='\t')
But when I reopen the dataset in other function for work i other purposes, and tried to access the columns in the same way, i.e.
df=pd.read_csv(file_path, sep='\t')
print df['Economics']
I've got
KeyError: Economics
I tried multiple encoding while reading, and also verified if it's not a multi-index dataframe, but everything was OK with encoding and index. I found out that there are another method: df.get('Economocs'), that, in this case worked without error. But, then, if I wanted iterated over the columns name, looking for 'Economics', again,I had an KeyError.
So my question: Why it happens? why sometimes I can access column directly with df['column_name'] and sometimes I need to use df.get('column_name'). And how to deal with column.names, in the case if the first method doesn't work?
回答1:
It looks like there is some unwanted character in the column name. Maybe is something like 'Economics ' or something else.
df.get('Economics')
in that case would not give KeyError, instead it would just return nothing.
Try checking the output of df.columns
and the length of the column name with len(df.columns[1])
.
回答2:
I guess you either have trailing spaces in all/some of your column names or even have just one column like in my test example below:
Test data:
Id Economics English History Literature
56 1 1 2 1
11 1 0 0 1
6 1 1 0 0
43 2 0 1 1
14 1 1 1 0
Test code:
import pandas as pd
df = pd.read_csv('test.csv', sep='\t')
print(df)
print(df.columns.tolist())
Output:
Id Economics English History Literature
0 56 1 1 2 1
1 11 1 0 0 1
2 6 1 1 0 0
3 43 2 0 1 1
4 14 1 1 1 0
['Id Economics English History Literature ']
DataFrame has only one column: 'Id Economics English History Literature '
Lets change sep='\t'
to sep='\s+'
in pd.read_csv()
and execute our test code against the same data set:
Id Economics English History Literature
0 56 1 1 2 1
1 11 1 0 0 1
2 6 1 1 0 0
3 43 2 0 1 1
4 14 1 1 1 0
['Id', 'Economics', 'English', 'History', 'Literature']
来源:https://stackoverflow.com/questions/35764172/accessing-the-column-in-pandas-in-different-way