Select multiple sections of rows by index in pandas

问题

I have large DataFrame with GPS path and some attributes. A few sections of the path are those which I need to analyse. I would like to subset only those sections to a new DataFrame. I can subset one section at the time but the idea is to have them all and to have an original index.

The problem is similar to:

import pandas as pd 
df = pd.DataFrame({'A':[0,1,2,3,4,5,6,7,8,9],'B':['a','b','c','d','e','f','g','h','i','j']},
                  index=range(10,20,))

I want o get something like:

cdf = df.loc[[11:13] & [17:20]] # SyntaxError: invalid syntax

desired outcome:

I know the example is easy with cdf = df.loc[[11,12,13,17,18,19],:] but in the original problem I have thousands of lines and some entries already removed, so listing points is rather not an option.

回答1:

One possible solution with concat:

cdf = pd.concat([df.loc[11:13], df.loc[17:20]])
print (cdf)
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

Another solution with range:

cdf = df.ix[list(range(11,14)) + list(range(17,20))]
print (cdf)
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

回答2:

You could use np.r_ to concatenate the slices:

In [16]: df.loc[np.r_[11:13, 17:20]]
Out[16]: 
    A  B
11  1  b
12  2  c
17  7  h
18  8  i
19  9  j

Note, however, that df.loc[A:B] selects labels A through B with B included. np.r_[A:B] returns an array of A through B with B excluded. To include B you would need to use np.r_[A:B+1].

When passed a slice, such as df.loc[A:B], df.loc ignores labels that are not in df.index. In contrast, when passed an array, such as df.loc[np.r_[A:B]], df.loc may add a new row filled with NaNs for each value in the array which is not in df.index.

Thus to produce the desired result, you would need to adjust the right endpoint of the slices and use isin to test for membership in df.index:

In [26]: df.loc[df.index.isin(np.r_[11:14, 17:21])]
Out[26]: 
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

来源：https://stackoverflow.com/questions/38828331/select-multiple-sections-of-rows-by-index-in-pandas

标签

python

pandas

slice