I have pandas DataFrame named \'dataset\' and it contains a column named \'class\'
when I execute the following line I get SyntaxError: invalid syntax
class
is a keyword in python. A rule of thumb: whenever you're dealing with column names that cannot be used as valid variable names in python, you must use the bracket notation to access: dataset['class'].unique()
.
There are, of course, exceptions here, but they work against your favour. For example, min
/max
is a valid variable name in python (even though it shadows builtins). In the case of pandas, however, you cannot refer to such a named column using the Attribute Access notation. There are more such exceptions, they're enumerated in the documentation.
A good place to begin with further reading is the documentation on Attribute Access. Specifically, the red Warning box), which I'm adding here for posterity:
You can use this access only if the index element is a valid Python identifier, e.g.
s.1
is not allowed. See here for an explanation of valid identifiers.The attribute will not be available if it conflicts with an existing method name, e.g.
s.min
is not allowed, buts['min']
is possible.Similarly, the attribute will not be available if it conflicts with any of the following list:
index, major_axis, minor_axis, items
.In any of these cases, standard indexing will still work, e.g.
s['1']
,s['min']
, ands['index']
will access the corresponding element or column.
class is reserved word. You can do as dataset['class'].unique()