Searching a particular value in a range among two columns python dataframe

自古美人都是妖i 提交于 2021-02-15 05:11:31

问题


I have two csv files.Depending upon the value of a cell in csv file 1 I should be able to search that value in a column of csv file 2 and get he corresponding value from other column in csv file 2. I am sorry if this very confusing.It will probably get clear by illustration

CSV file 1

Car   Mileage
 A       8
 B       6
 C       10

CSV file 2

Score  Mileage(Min)    Mileage(Max)
 1       1                 3
 2       4                 6
 3       7                 9
 4       10                12 
 5       13                15 

And my desired output CSV file is something like this

Car    Mileage     Score
 A       8           3
 B       6           2
 C       10          4

Car A is given a score of 3 depending upon its mileage 8 and then looking that mileage in csv file 2 in what range it falls and then getting corresponding score value for that range. Any help will be appreciated Thanks in advance


回答1:


As of writing this, the current stable release is v0.21.

To read your files, use pd.read_csv -

df0 = pd.read_csv('file1.csv')
df1 = pd.read_csv('file2.csv')

df0

  Car  Mileage
0   A        8
1   B        6
2   C       10

df1

   Score  Mileage(Min)  Mileage(Max)
0      1             1             3
1      2             4             6
2      3             7             9
3      4            10            12
4      5            13            15

To find the Score, use pd.IntervalIndex by calling IntervalIndex.from_tuples. This should be really fast -

v = df1.loc[:, 'Mileage(Min)':'Mileage(Max)'].apply(tuple, 1).tolist()
idx = pd.IntervalIndex.from_tuples(v, closed='both') # you can also use `from_arrays`


df0['Score'] = df1.iloc[idx.get_indexer(df0.Mileage.values), 'Score'].values
df0

  Car  Mileage  Score
0   A        8      3
1   B        6      2
2   C       10      4

Other methods of creating an IntervalIndex are outlined here.

To write your result, use pd.DataFrame.to_csv -

df0.to_csv('file3.csv')

Here's a high level outline of what I've done here.

  1. First, read in your CSV files
  2. Use pd.IntervalIndex to build an interval index tree. So, searching is now logarithmic in complexity.
  3. Use idx.get_indexer to find the index of each value in the tree
  4. Use the index to locate the Score value in df1, and assign this back to df0. Note that I call .values, otherwise, the values will be misaligned when assigning back.
  5. Write your result back to CSV

For more information on Intervalindex, take a look at this SO Q/A - Finding matching interval(s) in pandas Intervalindex


Note that IntervalIndex is new in v0.20, so if you have an older version, make sure you update your version with

pip install --upgrade pandas



回答2:


You can use IntervalIndex, new in version 0.20.0+:

First create DataFrames by read_csv:

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

Create IntervalIndex by from_arrays:

s = pd.IntervalIndex.from_arrays(df2['Mileage(Min)'], df2['Mileage(Max)'], 'both')

print (s)
IntervalIndex([[1, 3], [4, 6], [7, 9], [10, 12], [13, 15]]
              closed='both',
              dtype='interval[int64]')

Select Mileage values by intervalindex and set to new column by array created by values, because else indices are not aligned and get:

TypeError: incompatible index of inserted column with frame index

df1['Score'] = df2.set_index(s).loc[df1['Mileage'], 'Score'].values
print (df1)
  Car  Mileage  Score
0   A        8      3
1   B        6      2
2   C       10      4

And last write to file by to_csv:

df1.to_csv('file3.csv', index=False)



回答3:


Setup

data  = [(1,1,3), (2,4,6), (3,7,9), (4,10,12), (5,13,15)]
df = pd.DataFrame(data, columns=['Score','MMin','MMax'])

car_data = [('A', 8), ('B', 6), ('C', 10)]
car = pd.DataFrame(car_data, columns=['Car','Mileage'])

def find_score(x, df):
    result = -99
    for idx, row in df.iterrows():
        if x >= row.MMin and x <= row.MMax:
            result = row.Score
    return result

car['Score'] = car.Mileage.apply(lambda x: find_score(x, df))

Which yields

In [58]: car
Out[58]:
  Car  Mileage  Score
0   A        8      3
1   B        6      2
2   C       10      4


来源:https://stackoverflow.com/questions/47941113/searching-a-particular-value-in-a-range-among-two-columns-python-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!