SyntaxError when accessing column named “class” in pandas DataFrame

前端 未结 2 453
庸人自扰
庸人自扰 2021-01-20 17:14

I have pandas DataFrame named \'dataset\' and it contains a column named \'class\'

when I execute the following line I get SyntaxError: invalid syntax



        
2条回答
  •  孤街浪徒
    2021-01-20 18:01

    class is a keyword in python. A rule of thumb: whenever you're dealing with column names that cannot be used as valid variable names in python, you must use the bracket notation to access: dataset['class'].unique().

    There are, of course, exceptions here, but they work against your favour. For example, min/max is a valid variable name in python (even though it shadows builtins). In the case of pandas, however, you cannot refer to such a named column using the Attribute Access notation. There are more such exceptions, they're enumerated in the documentation.

    A good place to begin with further reading is the documentation on Attribute Access. Specifically, the red Warning box), which I'm adding here for posterity:

    • You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.

    • The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed, but s['min'] is possible.

    • Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items.

    • In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column.

提交回复
热议问题