问题
I have large DataFrame with GPS path and some attributes. A few sections of the path are those which I need to analyse. I would like to subset only those sections to a new DataFrame. I can subset one section at the time but the idea is to have them all and to have an original index.
The problem is similar to:
import pandas as pd
df = pd.DataFrame({'A':[0,1,2,3,4,5,6,7,8,9],'B':['a','b','c','d','e','f','g','h','i','j']},
index=range(10,20,))
I want o get something like:
cdf = df.loc[[11:13] & [17:20]] # SyntaxError: invalid syntax
desired outcome:
A B
11 1 b
12 2 c
13 3 d
17 7 h
18 8 i
19 9 j
I know the example is easy with cdf = df.loc[[11,12,13,17,18,19],:]
but in the original problem I have thousands of lines and some entries already removed, so listing points is rather not an option.
回答1:
One possible solution with concat:
cdf = pd.concat([df.loc[11:13], df.loc[17:20]])
print (cdf)
A B
11 1 b
12 2 c
13 3 d
17 7 h
18 8 i
19 9 j
Another solution with range
:
cdf = df.ix[list(range(11,14)) + list(range(17,20))]
print (cdf)
A B
11 1 b
12 2 c
13 3 d
17 7 h
18 8 i
19 9 j
回答2:
You could use np.r_ to concatenate the slices:
In [16]: df.loc[np.r_[11:13, 17:20]]
Out[16]:
A B
11 1 b
12 2 c
17 7 h
18 8 i
19 9 j
Note, however, that
df.loc[A:B]
selects labels A
through B
with B
included.
np.r_[A:B]
returns an array of A
through B
with B
excluded. To include B
you would need to use np.r_[A:B+1]
.
When passed a slice, such as df.loc[A:B]
, df.loc
ignores labels that are not in df.index
. In contrast, when passed an array, such as df.loc[np.r_[A:B]]
, df.loc
may add a new row filled with NaNs for each value in the array which is not in df.index
.
Thus to produce the desired result, you would need to adjust the right endpoint of the slices and use isin
to test for membership in df.index
:
In [26]: df.loc[df.index.isin(np.r_[11:14, 17:21])]
Out[26]:
A B
11 1 b
12 2 c
13 3 d
17 7 h
18 8 i
19 9 j
来源:https://stackoverflow.com/questions/38828331/select-multiple-sections-of-rows-by-index-in-pandas