In short ... I have a Python Pandas data frame that is read in from an Excel file using \'read_table\'. I would like to keep a handful of the series from the data, and purg
You can also specify a list of columns to keep with the usecols
option in pandas.read_table
. This speeds up the loading process as well.
You can use the DataFrame
drop
function to remove columns. You have to pass the axis=1
option for it to work on columns and not rows. Note that it returns a copy so you have to assign the result to a new DataFrame
:
In [1]: from pandas import *
In [2]: df = DataFrame(dict(x=[0,0,1,0,1], y=[1,0,1,1,0], z=[0,0,1,0,1]))
In [3]: df
Out[3]:
x y z
0 0 1 0
1 0 0 0
2 1 1 1
3 0 1 0
4 1 0 1
In [4]: df = df.drop(['x','y'], axis=1)
In [5]: df
Out[5]:
z
0 0
1 0
2 1
3 0
4 1
Basically the same as Zelazny7's answer -- just specifying what to keep:
In [68]: df
Out[68]:
x y z
0 0 1 0
1 0 0 0
2 1 1 1
3 0 1 0
4 1 0 1
In [70]: df = df[['x','z']]
In [71]: df
Out[71]:
x z
0 0 0
1 0 0
2 1 1
3 0 0
4 1 1
You can specify a large number of columns through indexing/slicing into the Dataframe.columns
object.
This object of type(pandas.Index)
can be viewed as a dict
of column labels (with some extended functionality).
See this extension of above examples:
In [4]: df.columns
Out[4]: Index([x, y, z], dtype=object)
In [5]: df[df.columns[1:]]
Out[5]:
y z
0 1 0
1 0 0
2 1 1
3 1 0
4 0 1
In [7]: df.drop(df.columns[1:], axis=1)
Out[7]:
x
0 0
1 0
2 1
3 0
4 1