问题
UPDATED:
In my dataset I have 3 columns (x,y) and VALUE.
It's looking like this(sorted already):
df1:
x , y ,value
1 , 1 , 12
2 , 2 , 12
4 , 3 , 12
1 , 1 , 11
2 , 2 , 11
4 , 3 , 11
1 , 1 , 33
2 , 2 , 33
4 , 3 , 33
I need to get those rows where, distance bewteen them (in X and Y column) is <= 1 , lets say its my radius. But in same time i need to group and filter only those where Value is equal. I had problems to compare it in one dataset because there was one header, so i have created second dataset with python commands:
df:
x , y ,value
1 , 1 , 12
2 , 2 , 12
4 , 3 , 12
x , y ,value
1 , 1 , 11
2 , 2 , 11
4 , 3 , 11
x , y ,value
1 , 1 , 33
2 , 2 , 33
4 , 3 , 33
I have tried to use this code:
def dist_value_comp(row):
x_dist = abs(df['y'] - row['y']) <= 1
y_dist = abs(df['x'] - row['x']) <= 1
xy_dist = x_dist & y_dist
max_value = df.loc[xy_dist, 'value'].max()
return row['value'] == max_value
df['keep_row'] = df.apply(dist_value_comp, axis=1)
df.loc[df['keep_row'], ['x', 'y', 'value']]
and
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
for i in filtered_df.groupby('value'):
print(i)
Before I have received errors connected with bad data frame, I have repaired it but I have still no results on output. That's how I am creating my new data frame df from df1, if you will have any better idea please put it here, is one have big minus because always prints me the table. And I test it again and this def gives me empty DataFrame.
VALUE1= df1.VALUE.unique()
def separator():
lst=[]
for VALUE in VALUE1:
abc= df1[df1.VALUE==VALUE]
print abc
return lst
ab=separator()
df=pd.DataFrame(ab)
When I am trying normal dataset df1, I have on output all data without taking into account radius =1
I need to get on my output table like this one:
x , y ,value
1 , 1 , 12
2 , 2 , 12
x , y ,value
1 , 1 , 11
2 , 2 , 11
x , y ,value
1 , 1 , 33
2 , 2 , 33
UPDATE 2:
I am working right now with this code:
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
for i in filtered_df.groupby('value'):
print(i)
It seems to be ok(i am taking df1 as input), but when i am looking on the output, its doing nothing because he dont know from what value it should use the radius +/-1, thats the reason i think. In my dataset i have more columns, so lets take into account my 4th and 5th column 'D'&'E', so radius will be taken from this row where is minimum value in column D & E in same time.
df1:
x , y ,value ,D ,E
1 , 1 , 12 , 1 , 2
2 , 2 , 12 , 2 , 3
4 , 3 , 12 , 3 , 4
1 , 1 , 11 , 2 , 1
2 , 2 , 11 , 3 , 2
4 , 3 , 11 , 5 , 3
1 , 1 , 33 , 1 , 3
2 , 2 , 33 , 2 , 3
4 , 3 , 33 , 3 , 3
So output result should be same as i want to , but right now i know from what value radius +/-1 in this case should start. Anyone can help me right now? Sorry for misunderstanding !
回答1:
From what I understand, the order in which you make your operations (filter those with distance <= 1 and grouping them) has no importance.
Here is my take:
#first selection of the lines with right distance
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
# Then group
for i in filtered_df.groupby('value'):
print(i)
# Or do whatever you want
Let me know if you want some explanations on how some part of the code works.
来源:https://stackoverflow.com/questions/42997271/leaving-rows-with-a-giving-value-in-column