问题
I have the following 3 lists:
minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']
my_list = [
['Morocco', 'Meat', '190,00', '0,15'],
['Morocco', 'Meat', '189,90', '0,32'],
['Morocco', 'Meat', '189,38', '0,44'],
['Morocco', 'Meat', '188,94', '0,60'],
['Morocco', 'Meat', '188,49', '0,78'],
['Morocco', 'Meat', '187,99', '0,70'],
['Spain', 'Meat', '190,76', '0,10'],
['Spain', 'Meat', '190,16', '0,20'],
['Spain', 'Meat', '189,56', '0,35'],
['Spain', 'Meat', '189,01', '0,40'],
['Spain', 'Meat', '188,13', '0,75'],
['Spain', 'Meat', '187,95', '0,85'],
['Italy', 'Meat', '190,20', '0,11'],
['Italy', 'Meat', '190,10', '0,31'],
['Italy', 'Meat', '189,32', '0,45'],
['Italy', 'Meat', '188,61', '0,67'],
['Italy', 'Meat', '188,01', '0,72'],
['Italy', 'Meat', '187,36', '0,55']]
I'm trying to filter my_list
based if index [-1]
is between the value in minimal_values
and the value in maximal_values
.These values are mpping the min and max by country. Im also doing a substraction inside the list. So for Morocco I only want the rows where index[-1]
is between 0,32
and 0,78
etc. The problem is that after 0,78
the values drops to 0,70
which means that row also satifies the if statement.
Note:The values in my_list
-1
are first asceding and then descending. I only want the rows in the ascending part, not in the descending part. Im not sure how to solve this problem.
This is my code:
price = 500
# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]
# Collect all unique countries in a list.
countries = list(set(country[0] for country in my_list))
results = []
for l in my_list:
i = countries.index(l[0])
if minimal_values[i] <= float(l[-1].replace(',', '.')) <= maximal_values[i]:
new_index_2 = price - float(l[-2].replace(',', '.'))
l[-2] = new_index_2
results.append(l)
print(results)
This is my current output:
[['Morocco', 'Meat', '189.90', '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Morocco', 'Meat', 312.01, '0,70'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', '312.05', '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72'],
['Italy', 'Meat', 312.64, '0,55']]
This is my desired output:
[['Morocco', 'Meat', '189.90', '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', '312.05', '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72']]
*****Pandas related answers are also welcome.
回答1:
Note that you have an issue in your code in that the order of elements of countries
is not necessarily the same as the order of countries in my_list
. It's easier just to process the countries as you process the list, making a note when the country name changes. You can then add a flag to your loop that indicates that processing for this country has completed (when the current value is less than the previous value) and if so, ignore remaining values for this country:
# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]
# Collect all unique countries in a list.
results = []
finished_country = -1
country_index = -1
last_country = ''
for l in my_list:
country = l[0]
if country != last_country:
country_index += 1
last_country = country
value = float(l[-1].replace(',', '.'))
if finished_country == country_index or value < minimal_values[country_index]:
last_value = 0
continue
if value < last_value:
finished_country = country_index
elif value <= maximal_values[country_index]:
new_index_2 = price - float(l[-2].replace(',', '.'))
l[-2] = new_index_2
results.append(l)
last_value = value
Output for your sample data:
[
['Morocco', 'Meat', 310.1, '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', 312.05, '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72']
]
回答2:
pandas solution:
import pandas as pd
import numpy as np
# create input dataframe
my_list = [
['Morocco', 'Meat', '190,00', '0,15'],
['Morocco', 'Meat', '189,90', '0,32'],
['Morocco', 'Meat', '189,38', '0,44'],
['Morocco', 'Meat', '188,94', '0,60'],
['Morocco', 'Meat', '188,49', '0,78'],
['Morocco', 'Meat', '187,99', '0,70'],
['Spain', 'Meat', '190,76', '0,10'],
['Spain', 'Meat', '190,16', '0,20'],
['Spain', 'Meat', '189,56', '0,35'],
['Spain', 'Meat', '189,01', '0,40'],
['Spain', 'Meat', '188,13', '0,75'],
['Spain', 'Meat', '187,95', '0,85'],
['Italy', 'Meat', '190,20', '0,11'],
['Italy', 'Meat', '190,10', '0,31'],
['Italy', 'Meat', '189,32', '0,45'],
['Italy', 'Meat', '188,61', '0,67'],
['Italy', 'Meat', '188,01', '0,72'],
['Italy', 'Meat', '187,36', '0,55']]
dfi = pd.DataFrame(my_list).applymap(lambda x: x.replace(',', '.'))
dfi[[2, 3]] = dfi[[2, 3]].astype(float)
print(dfi)
# 0 1 2 3
# 0 Morocco Meat 190.00 0.15
# 1 Morocco Meat 189.90 0.32
# 2 Morocco Meat 189.38 0.44
# 3 Morocco Meat 188.94 0.60
# 4 Morocco Meat 188.49 0.78
# 5 Morocco Meat 187.99 0.70
# 6 Spain Meat 190.76 0.10
# 7 Spain Meat 190.16 0.20
# 8 Spain Meat 189.56 0.35
# 9 Spain Meat 189.01 0.40
# 10 Spain Meat 188.13 0.75
# 11 Spain Meat 187.95 0.85
# 12 Italy Meat 190.20 0.11
# 13 Italy Meat 190.10 0.31
# 14 Italy Meat 189.32 0.45
# 15 Italy Meat 188.61 0.67
# 16 Italy Meat 188.01 0.72
# 17 Italy Meat 187.36 0.55
# create df_filter with contry and min_v, max_v
minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]
df_filter = pd.DataFrame(list(zip(dfi[0].unique().tolist(),
minimal_values,
maximal_values)))
df_filter.columns = [0, 'min_v', 'max_v']
print(df_filter)
# 0 min_v max_v
# 0 Morocco 0.32 0.78
# 1 Spain 0.35 0.85
# 2 Italy 0.45 0.72
# merge dfi and fi_filter
dfm = pd.merge(dfi, df_filter, on=0, how='left')
print(dfm)
# 0 1 2 3 min_v max_v
# 0 Morocco Meat 190.00 0.15 0.32 0.78
# 1 Morocco Meat 189.90 0.32 0.32 0.78
# 2 Morocco Meat 189.38 0.44 0.32 0.78
# 3 Morocco Meat 188.94 0.60 0.32 0.78
# 4 Morocco Meat 188.49 0.78 0.32 0.78
# 5 Morocco Meat 187.99 0.70 0.32 0.78
# 6 Spain Meat 190.76 0.10 0.35 0.85
# 7 Spain Meat 190.16 0.20 0.35 0.85
# 8 Spain Meat 189.56 0.35 0.35 0.85
# 9 Spain Meat 189.01 0.40 0.35 0.85
# 10 Spain Meat 188.13 0.75 0.35 0.85
# 11 Spain Meat 187.95 0.85 0.35 0.85
# 12 Italy Meat 190.20 0.11 0.45 0.72
# 13 Italy Meat 190.10 0.31 0.45 0.72
# 14 Italy Meat 189.32 0.45 0.45 0.72
# 15 Italy Meat 188.61 0.67 0.45 0.72
# 16 Italy Meat 188.01 0.72 0.45 0.72
# 17 Italy Meat 187.36 0.55 0.45 0.72
# filter min_v <= column 3 <= max_v
cond = dfm[3].ge(dfm.min_v) & dfm[3].le(dfm.max_v)
dfm = dfm[cond].copy()
# filter 3 that is not ascending
cond = dfm.groupby(0)[3].diff() < 0
dfo = dfm.loc[~cond, [0,1,2,3]].reset_index(drop=True)
# outut result
price = 500
dfo[2] = price - dfo[2]
print(dfo)
# 0 1 2 3
# 0 Morocco Meat 310.10 0.32
# 1 Morocco Meat 310.62 0.44
# 2 Morocco Meat 311.06 0.60
# 3 Morocco Meat 311.51 0.78
# 4 Spain Meat 310.44 0.35
# 5 Spain Meat 310.99 0.40
# 6 Spain Meat 311.87 0.75
# 7 Spain Meat 312.05 0.85
# 8 Italy Meat 310.68 0.45
# 9 Italy Meat 311.39 0.67
# 10 Italy Meat 311.99 0.72
回答3:
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]
countries_largest = {}
filtered_list = []
for row in my_list:
country_name = row[0]
value = float(row[-1].replace(',','.'))
if country_name in countries_largest and value < countries_largest[country_name]:
continue
countries_largest[country_name] = value
if not (minimal_values[len(countries_largest)-1] <= value <= maximal_values[len(countries_largest)-1]):
continue
filtered_list.append(row)
[['Morocco', 'Meat', '189,90', '0,32'],
['Morocco', 'Meat', '189,38', '0,44'],
['Morocco', 'Meat', '188,94', '0,60'],
['Morocco', 'Meat', '188,49', '0,78'],
['Spain', 'Meat', '189,56', '0,35'],
['Spain', 'Meat', '189,01', '0,40'],
['Spain', 'Meat', '188,13', '0,75'],
['Spain', 'Meat', '187,95', '0,85'],
['Italy', 'Meat', '189,32', '0,45'],
['Italy', 'Meat', '188,61', '0,67'],
['Italy', 'Meat', '188,01', '0,72']]
回答4:
Given:
minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']
my_list = [
['Morocco', 'Meat', '190,00', '0,15'],
['Morocco', 'Meat', '189,90', '0,32'],
['Morocco', 'Meat', '189,38', '0,44'],
['Morocco', 'Meat', '188,94', '0,60'],
['Morocco', 'Meat', '188,49', '0,78'],
['Morocco', 'Meat', '187,99', '0,70'],
['Spain', 'Meat', '190,76', '0,10'],
['Spain', 'Meat', '190,16', '0,20'],
['Spain', 'Meat', '189,56', '0,35'],
['Spain', 'Meat', '189,01', '0,40'],
['Spain', 'Meat', '188,13', '0,75'],
['Spain', 'Meat', '187,95', '0,85'],
['Italy', 'Meat', '190,20', '0,11'],
['Italy', 'Meat', '190,10', '0,31'],
['Italy', 'Meat', '189,32', '0,45'],
['Italy', 'Meat', '188,61', '0,67'],
['Italy', 'Meat', '188,01', '0,72'],
['Italy', 'Meat', '187,36', '0,55']]
First, since we are going to be using it a bunch, let write a little conversion routine that standardizes what we mean by a 'float' in your case:
def conv(s):
try:
return float(s.replace(',','.'))
except ValueError:
return s
Now it seems that your two lists of strings minimal_values
and maximal_values
are a mapping to the min and max by country. If so, your use of countries = list(set(country[0] for country in my_list))
will not work since sets are in arbitrary order in all versions of Python.
If you have Python 3.6+, you can do:
countries = list({}.fromkeys(country[0] for country in my_list))
since dicts retain insertion order in Python 3.6+. Assuming you want something that works on all version of Python, you can instead do:
def uniqs_in_order(li):
seen=set()
return [e for e in li if not (e in seen or seen.add(e))]
# Python 3.6+: return list({}.fromkeys(li))
Now you can create a mapping of country:tuple of min/max value for that country:
mapping={k:(min_, max_) for k,min_,max_ in
zip(uniqs_in_order([sl[0] for sl in my_list]),
[conv(s) for s in minimal_values],
[conv(s) for s in maximal_values])}
>>> mapping
{'Morocco': (0.32, 0.78), 'Spain': (0.35, 0.85), 'Italy': (0.45, 0.72)}
Now, finally, we can filter. Since you want to only take values that:
- Are within the min and max by country, and;
- Stopping when the values by country are no longer ascending.
We can use groupby
from itertools in order to slice the list of lists by country and perform those two tests:
from itertools import groupby
filt=[]
price = 500
for k,v in groupby(my_list, key=lambda sl: sl[0]):
section=list(v)
for i, row in enumerate(section):
if i and conv(row[-1])<conv(section[i-1][-1]):
break
if mapping[row[0]][0]<=conv(row[-1])<=mapping[row[0]][1]:
row[-2]=price-conv(row[-2])
filt.append(row)
>>> filt
[['Morocco', 'Meat', 310.1, '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', 312.05, '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72']]
来源:https://stackoverflow.com/questions/65574747/how-to-filter-a-list-based-on-ascending-values