Pandas Equivalent of R's which()

前端 未结 6 1578
广开言路
广开言路 2020-12-31 04:12

Variations of this question have been asked before, I\'m still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that

相关标签:
6条回答
  • 2020-12-31 04:38

    I may not understand clearly the question, but it looks like the response is easier than what you think:

    using pandas DataFrame:

    df['colname'] > somenumberIchoose
    

    returns a pandas series with True / False values and the original index of the DataFrame.

    Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

    df[df['colname'] > somenumberIchoose]
    

    should be enough.

    See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

    0 讨论(0)
  • 2020-12-31 04:47

    What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.

    If you want the indices of an array who values are divisible by two then the following would work.

    arr = numpy.arange(10)
    truth_table = arr % 2 == 0
    indices = numpy.where(truth_table)
    values = arr[indices]
    

    It's also easy to work with multi-dimensional arrays

    arr2d = arr.reshape(2,5)
    col_indices = numpy.where(arr2d[col_index] % 2 == 0)
    col_values = arr2d[col_index, col_indices]
    
    0 讨论(0)
  • 2020-12-31 04:57

    A nice simple and neat way of doing this is the following:

    SlicedData1 = df[df.colname>somenumber]]
    

    This can easily be extended to include other criteria, such as non-numeric data:

    SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]
    

    And so on...

    0 讨论(0)
  • 2020-12-31 04:59

    Instead of enumerate, I usually just use .iteritems. This saves a .index(). Namely,

    [k for k, v in (df['c'] > t).iteritems() if v]
    

    Otherwise, one has to do

    df[df['c'] > t].index()
    

    This duplicates the typing of the data frame name, which can be very long and painful to type.

    0 讨论(0)
  • 2020-12-31 05:01

    enumerate() returns an iterator that yields an (index, item) tuple in each iteration, so you can't (and don't need to) call .index() again.

    Furthermore, your list comprehension syntax is wrong:

    indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]
    

    Test case:

    >>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
    [(3, 'd'), (4, 'e'), (5, 'f')]
    

    Of course, you don't need to unpack the tuple:

    >>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
    [(3, 'd'), (4, 'e'), (5, 'f')]
    

    unless you're only interested in the indices, in which case you could do something like

    >>> [index for (index, x) in enumerate("abcdef") if x > "c"]
    [3, 4, 5]
    
    0 讨论(0)
  • 2020-12-31 05:03

    And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).

    Just multiplicate the indexes:

    idx1 = df['lat'] == 49
    idx2 = df['lng'] > 15 
    idx = idx1 * idx2
    
    new_df = df[idx] 
    
    0 讨论(0)
提交回复
热议问题