问题
How do I:
- Select last 3 columns in a dataframe and create a new dataframe?
I tried:
y = dataframe.iloc[:,-3:]
- Exclude last 3 columns and create a new dataframe?
I tried:
X = dataframe.iloc[:,:-3]
Is this correct?
I am getting array dimensional errors further in my code and want to make sure this step is correct.
Thank you
回答1:
just do:
y = dataframe[dataframe.columns[-3:]]
This slices the columns so you can sub-select from the df
Example:
In [221]:
df = pd.DataFrame(columns=np.arange(10))
df[df.columns[-3:]]
Out[221]:
Empty DataFrame
Columns: [7, 8, 9]
Index: []
I think the issue here is that because you have taken a slice of the df, it's returned a view but depending on what the rest of your code is doing it's raising a warning. You can make an explicit copy by calling .copy()
to remove the warnings.
So if we take a copy then assignment only affects the copy and not the original df:
In [15]:
df = pd.DataFrame(np.random.randn(5,10), columns= np.arange(10))
df
Out[15]:
0 1 2 3 4 5 6 \
0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308
1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115
2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485
3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028
4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
In [16]:
y = df[df.columns[-3:]].copy()
y
Out[16]:
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
In [17]:
y[y>0] = 0
print(y)
df
7 8 9
0 0.000000 -0.475619 -0.525770
1 0.000000 0.000000 0.000000
2 0.000000 -0.144652 0.000000
3 -0.609804 -0.833186 -1.033656
4 0.000000 0.000000 0.000000
Out[17]:
0 1 2 3 4 5 6 \
0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308
1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115
2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485
3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028
4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244
7 8 9
0 1.741925 -0.475619 -0.525770
1 2.137546 0.215665 1.908362
2 1.180281 -0.144652 0.870887
3 -0.609804 -0.833186 -1.033656
4 0.480943 1.971933 1.928037
Here no warning is raised and the original df is untouched.
回答2:
This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).
*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:
df.iloc[-3:] see the docs.
As Wes points out, in this specific case you should just use tail!
It should also be noted that in Pandas pre-0.14 iloc will raise an IndexError on an out-of-bounds access, while .head() and .tail() will not:
pd.version '0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 Old answer (depreciated method):
You can use the irows DataFrame method to overcome this ambiguity:
In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 21.102 NaN 10 568 20080930 26.00 30.769 NaN Note: Series has a similar iget method.
来源:https://stackoverflow.com/questions/33042633/selecting-last-n-columns-and-excluding-last-n-columns-in-dataframe