When deleting a column in a DataFrame I use:
del df[\'column_name\']
And this works great. Why can\'t I use the following?
Delete first, second and fourth columns:
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
Delete first column:
df.drop(df.columns[[0]], axis=1, inplace=True)
There is an optional parameter inplace
so that the original
data can be modified without creating a copy.
Column selection, addition, deletion
Delete column column-name
:
df.pop('column-name')
df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])
print df
:
one two three
A 1 2 3
B 4 5 6
C 7 8 9
df.drop(df.columns[[0]], axis=1, inplace=True)
print df
:
two three
A 2 3
B 5 6
C 8 9
three = df.pop('three')
print df
:
two
A 2
B 5
C 8
df.drop('columnname', axis =1, inplace = True)
or else you can go with
del df['colname']
To delete multiple columns based on column numbers
df.drop(df.iloc[:,1:3], axis = 1, inplace = True)
To delete multiple columns based on columns names
df.drop(['col1','col2',..'coln'], axis = 1, inplace = True)
It's good practice to always use the []
notation. One reason is that attribute notation (df.column_name
) does not work for numbered indices:
In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])
In [2]: df[1]
Out[2]:
0 2
1 5
Name: 1
In [3]: df.1
File "<ipython-input-3-e4803c0d1066>", line 1
df.1
^
SyntaxError: invalid syntax
The actual question posed, missed by most answers here is:
del df.column_name
?At first we need to understand the problem, which requires us to dive into python magic methods.
As Wes points out in his answer del df['column']
maps to the python magic method df.__delitem__('column')
which is implemented in pandas to drop the column
However, as pointed out in the link above about python magic methods:
In fact,
__del__
should almost never be used because of the precarious circumstances under which it is called; use it with caution!
You could argue that del df['column_name']
should not be used or encouraged, and thereby del df.column_name
should not even be considered.
However, in theory, del df.column_name
could be implemeted to work in pandas using the magic method __delattr__. This does however introduce certain problems, problems which the del df['column_name']
implementation already has, but in lesser degree.
What if I define a column in a dataframe called "dtypes" or "columns".
Then assume I want to delete these columns.
del df.dtypes
would make the __delattr__
method confused as if it should delete the "dtypes" attribute or the "dtypes" column.
.ix
, .loc
or .iloc
methods.You cannot do del df.column_name
because pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users.
Don't use df.column_name, It may be pretty, but it causes cognitive dissonance
There are multiple ways of deleting a column.
There should be one-- and preferably only one --obvious way to do it.
Columns are sometimes attributes but sometimes not.
Special cases aren't special enough to break the rules.
Does del df.dtypes
delete the dtypes attribute or the dtypes column?
In the face of ambiguity, refuse the temptation to guess.
The best way to do this in pandas is to use drop:
df = df.drop('column_name', 1)
where 1
is the axis number (0
for rows and 1
for columns.)
To delete the column without having to reassign df
you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:
df = df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Also working with "text" syntax for the columns:
df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
Note: Introduced in v0.21.0 (October 27, 2017), the drop() method accepts index/columns keywords as an alternative to specifying the axis.
So we can now just do:
df.drop(columns=['B', 'C'])