Variations of this question have been asked before, I\'m still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that
I may not understand clearly the question, but it looks like the response is easier than what you think:
using pandas DataFrame:
df['colname'] > somenumberIchoose
returns a pandas series with True / False values and the original index of the DataFrame.
Then you can use that boolean series on the original DataFrame and get the subset you are looking for:
df[df['colname'] > somenumberIchoose]
should be enough.
See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.
If you want the indices of an array who values are divisible by two then the following would work.
arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]
It's also easy to work with multi-dimensional arrays
arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]
A nice simple and neat way of doing this is the following:
SlicedData1 = df[df.colname>somenumber]]
This can easily be extended to include other criteria, such as non-numeric data:
SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]
And so on...
Instead of enumerate
, I usually just use .iteritems
. This saves a .index()
. Namely,
[k for k, v in (df['c'] > t).iteritems() if v]
Otherwise, one has to do
df[df['c'] > t].index()
This duplicates the typing of the data frame name, which can be very long and painful to type.
enumerate()
returns an iterator that yields an (index, item)
tuple in each iteration, so you can't (and don't need to) call .index()
again.
Furthermore, your list comprehension syntax is wrong:
indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]
Test case:
>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
Of course, you don't need to unpack the tuple:
>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
unless you're only interested in the indices, in which case you could do something like
>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]
And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).
Just multiplicate the indexes:
idx1 = df['lat'] == 49
idx2 = df['lng'] > 15
idx = idx1 * idx2
new_df = df[idx]