How to filter a list based on ascending values?

问题

I have the following 3 lists:

minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']

my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

I'm trying to filter my_list based if index [-1] is between the value in minimal_values and the value in maximal_values.These values are mpping the min and max by country. Im also doing a substraction inside the list. So for Morocco I only want the rows where index[-1] is between 0,32 and 0,78 etc. The problem is that after 0,78 the values drops to 0,70 which means that row also satifies the if statement.

Note:The values in my_list -1 are first asceding and then descending. I only want the rows in the ascending part, not in the descending part. Im not sure how to solve this problem.

This is my code:

price = 500

# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

# Collect all unique countries in a list.
countries = list(set(country[0] for country in my_list))

results = []
for l in my_list:
    i = countries.index(l[0])
    if minimal_values[i] <= float(l[-1].replace(',', '.')) <= maximal_values[i]:
        new_index_2 = price - float(l[-2].replace(',', '.'))
        l[-2] = new_index_2
        results.append(l)

print(results)

This is my current output:

[['Morocco', 'Meat', '189.90', '0,32'], 
['Morocco', 'Meat', 310.62, '0,44'], 
['Morocco', 'Meat', 311.06, '0,60'], 
['Morocco', 'Meat', 311.51, '0,78'], 
['Morocco', 'Meat', 312.01, '0,70'], 
['Spain', 'Meat', 310.44, '0,35'], 
['Spain', 'Meat', 310.99, '0,40'], 
['Spain', 'Meat', 311.87, '0,75'], 
['Spain', 'Meat', '312.05', '0,85'],
['Italy', 'Meat', 310.68, '0,45'], 
['Italy', 'Meat', 311.39, '0,67'], 
['Italy', 'Meat', 311.99, '0,72'], 
['Italy', 'Meat', 312.64, '0,55']]

This is my desired output:

 [['Morocco', 'Meat', '189.90', '0,32'], 
    ['Morocco', 'Meat', 310.62, '0,44'], 
    ['Morocco', 'Meat', 311.06, '0,60'], 
    ['Morocco', 'Meat', 311.51, '0,78'], 
    ['Spain', 'Meat', 310.44, '0,35'], 
    ['Spain', 'Meat', 310.99, '0,40'], 
    ['Spain', 'Meat', 311.87, '0,75'],
    ['Spain', 'Meat', '312.05', '0,85'], 
    ['Italy', 'Meat', 310.68, '0,45'], 
    ['Italy', 'Meat', 311.39, '0,67'], 
    ['Italy', 'Meat', 311.99, '0,72']]

*****Pandas related answers are also welcome.

回答1:

Note that you have an issue in your code in that the order of elements of countries is not necessarily the same as the order of countries in my_list. It's easier just to process the countries as you process the list, making a note when the country name changes. You can then add a flag to your loop that indicates that processing for this country has completed (when the current value is less than the previous value) and if so, ignore remaining values for this country:

# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

# Collect all unique countries in a list.
results = []
finished_country = -1
country_index = -1
last_country = ''
for l in my_list:
    country = l[0]
    if country != last_country:
        country_index += 1
    last_country = country
    value = float(l[-1].replace(',', '.'))
    if finished_country == country_index or value < minimal_values[country_index]:
        last_value = 0
        continue
    if value < last_value:
        finished_country = country_index
    elif value <= maximal_values[country_index]:
        new_index_2 = price - float(l[-2].replace(',', '.'))
        l[-2] = new_index_2
        results.append(l)
    last_value = value

Output for your sample data:

[
 ['Morocco', 'Meat', 310.1, '0,32'],
 ['Morocco', 'Meat', 310.62, '0,44'],
 ['Morocco', 'Meat', 311.06, '0,60'],
 ['Morocco', 'Meat', 311.51, '0,78'],
 ['Spain', 'Meat', 310.44, '0,35'],
 ['Spain', 'Meat', 310.99, '0,40'],
 ['Spain', 'Meat', 311.87, '0,75'],
 ['Spain', 'Meat', 312.05, '0,85'],
 ['Italy', 'Meat', 310.68, '0,45'],
 ['Italy', 'Meat', 311.39, '0,67'],
 ['Italy', 'Meat', 311.99, '0,72']
]

回答2:

pandas solution:

import pandas as pd
import numpy as np

# create input dataframe
my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

dfi = pd.DataFrame(my_list).applymap(lambda x: x.replace(',', '.'))
dfi[[2, 3]] = dfi[[2, 3]].astype(float)
print(dfi)

#         0     1       2     3
# 0   Morocco  Meat  190.00  0.15
# 1   Morocco  Meat  189.90  0.32
# 2   Morocco  Meat  189.38  0.44
# 3   Morocco  Meat  188.94  0.60
# 4   Morocco  Meat  188.49  0.78
# 5   Morocco  Meat  187.99  0.70
# 6     Spain  Meat  190.76  0.10
# 7     Spain  Meat  190.16  0.20
# 8     Spain  Meat  189.56  0.35
# 9     Spain  Meat  189.01  0.40
# 10    Spain  Meat  188.13  0.75
# 11    Spain  Meat  187.95  0.85
# 12    Italy  Meat  190.20  0.11
# 13    Italy  Meat  190.10  0.31
# 14    Italy  Meat  189.32  0.45
# 15    Italy  Meat  188.61  0.67
# 16    Italy  Meat  188.01  0.72
# 17    Italy  Meat  187.36  0.55

# create df_filter with contry and min_v, max_v
minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

df_filter = pd.DataFrame(list(zip(dfi[0].unique().tolist(),
                                  minimal_values,
                                  maximal_values)))
df_filter.columns = [0, 'min_v', 'max_v']
print(df_filter)
#          0  min_v  max_v
# 0  Morocco   0.32   0.78
# 1    Spain   0.35   0.85
# 2    Italy   0.45   0.72

# merge dfi and fi_filter
dfm = pd.merge(dfi, df_filter, on=0, how='left')
print(dfm)

#          0     1       2     3  min_v  max_v
# 0   Morocco  Meat  190.00  0.15   0.32   0.78
# 1   Morocco  Meat  189.90  0.32   0.32   0.78
# 2   Morocco  Meat  189.38  0.44   0.32   0.78
# 3   Morocco  Meat  188.94  0.60   0.32   0.78
# 4   Morocco  Meat  188.49  0.78   0.32   0.78
# 5   Morocco  Meat  187.99  0.70   0.32   0.78
# 6     Spain  Meat  190.76  0.10   0.35   0.85
# 7     Spain  Meat  190.16  0.20   0.35   0.85
# 8     Spain  Meat  189.56  0.35   0.35   0.85
# 9     Spain  Meat  189.01  0.40   0.35   0.85
# 10    Spain  Meat  188.13  0.75   0.35   0.85
# 11    Spain  Meat  187.95  0.85   0.35   0.85
# 12    Italy  Meat  190.20  0.11   0.45   0.72
# 13    Italy  Meat  190.10  0.31   0.45   0.72
# 14    Italy  Meat  189.32  0.45   0.45   0.72
# 15    Italy  Meat  188.61  0.67   0.45   0.72
# 16    Italy  Meat  188.01  0.72   0.45   0.72
# 17    Italy  Meat  187.36  0.55   0.45   0.72

# filter min_v <= column 3 <= max_v
cond = dfm[3].ge(dfm.min_v) & dfm[3].le(dfm.max_v)
dfm = dfm[cond].copy()

# filter 3 that is not ascending
cond = dfm.groupby(0)[3].diff() < 0
dfo = dfm.loc[~cond, [0,1,2,3]].reset_index(drop=True)

# outut result
price = 500
dfo[2] = price - dfo[2]

print(dfo)

#           0     1       2     3
# 0   Morocco  Meat  310.10  0.32
# 1   Morocco  Meat  310.62  0.44
# 2   Morocco  Meat  311.06  0.60
# 3   Morocco  Meat  311.51  0.78
# 4     Spain  Meat  310.44  0.35
# 5     Spain  Meat  310.99  0.40
# 6     Spain  Meat  311.87  0.75
# 7     Spain  Meat  312.05  0.85
# 8     Italy  Meat  310.68  0.45
# 9     Italy  Meat  311.39  0.67
# 10    Italy  Meat  311.99  0.72

回答3:


minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

countries_largest = {}
filtered_list = []
for row in my_list:
    country_name = row[0]
    value = float(row[-1].replace(',','.'))
    if country_name in countries_largest and value < countries_largest[country_name]:
        continue
    countries_largest[country_name] = value
    if not (minimal_values[len(countries_largest)-1] <= value <= maximal_values[len(countries_largest)-1]):
        continue
    filtered_list.append(row)

[['Morocco', 'Meat', '189,90', '0,32'],
 ['Morocco', 'Meat', '189,38', '0,44'],
 ['Morocco', 'Meat', '188,94', '0,60'],
 ['Morocco', 'Meat', '188,49', '0,78'],
 ['Spain', 'Meat', '189,56', '0,35'],
 ['Spain', 'Meat', '189,01', '0,40'],
 ['Spain', 'Meat', '188,13', '0,75'],
 ['Spain', 'Meat', '187,95', '0,85'],
 ['Italy', 'Meat', '189,32', '0,45'],
 ['Italy', 'Meat', '188,61', '0,67'],
 ['Italy', 'Meat', '188,01', '0,72']]

回答4:

Given:

minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']

my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

First, since we are going to be using it a bunch, let write a little conversion routine that standardizes what we mean by a 'float' in your case:

def conv(s):
    try:
        return float(s.replace(',','.'))
    except ValueError:
        return s

Now it seems that your two lists of strings minimal_values and maximal_values are a mapping to the min and max by country. If so, your use of countries = list(set(country[0] for country in my_list)) will not work since sets are in arbitrary order in all versions of Python.

If you have Python 3.6+, you can do:

countries = list({}.fromkeys(country[0] for country in my_list))

since dicts retain insertion order in Python 3.6+. Assuming you want something that works on all version of Python, you can instead do:

def uniqs_in_order(li):
    seen=set()
    return [e for e in li if not (e in seen or seen.add(e))]
    # Python 3.6+: return list({}.fromkeys(li))

Now you can create a mapping of country:tuple of min/max value for that country:

mapping={k:(min_, max_) for k,min_,max_ in 
    zip(uniqs_in_order([sl[0] for sl in my_list]), 
                        [conv(s) for s in minimal_values], 
                        [conv(s) for s in maximal_values])}

>>> mapping
{'Morocco': (0.32, 0.78), 'Spain': (0.35, 0.85), 'Italy': (0.45, 0.72)}

Now, finally, we can filter. Since you want to only take values that:

Are within the min and max by country, and;
Stopping when the values by country are no longer ascending.

We can use groupby from itertools in order to slice the list of lists by country and perform those two tests:

from itertools import groupby

filt=[]
price = 500
for k,v in groupby(my_list, key=lambda sl: sl[0]):
    section=list(v)
    for i, row in enumerate(section):
        if i and conv(row[-1])<conv(section[i-1][-1]):
            break
        if mapping[row[0]][0]<=conv(row[-1])<=mapping[row[0]][1]:
            row[-2]=price-conv(row[-2])
            filt.append(row)        

>>> filt
[['Morocco', 'Meat', 310.1, '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', 312.05, '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72']]

来源：https://stackoverflow.com/questions/65574747/how-to-filter-a-list-based-on-ascending-values

标签

python

pandas

list

if-statement

floating-point