conditionally replace values in one list using another list of different length and ranges based on %age overlap in python

后端未结

关注

 3  430

One text file \'Truth\' contains these following values :

0.000000    3.810000    Three
3.810000    3.910923    NNNN
3.910923    5.429000    AAAA
5.429000


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-01-14 18:23
              
            
            
                                                                       
This is "just" number crunching - here is one way:

raw_test = [[0.000000   , 3.810000  ,  'Three'],
        [3.810000   , 3.910923  ,  'Three'],
        [3.910923   , 5.429000  ,  'AAAA '],
        [5.429000   , 7.060000  ,  'Three'],
        [7.060000   , 8.411000  ,  'Three'],
        [8.411000   , 8.971000  ,  'Zero'],
        [8.971000   , 13.40600  ,  'Three'],
        [13.40600   , 13.82700  ,  'Zero'], 
        [13.82700   , 15.935554 ,  'Two'], 
        [15.935554  , 20.138337 ,  'Two'],]

raw_truth = [[0.000000 ,   1.00000   ,  'MMMM'],
   [1.000    ,   3.810000  ,  'Three'],
   [3.810000 ,   3.910923  ,  'NNNN'],
   [3.910923 ,   5.429000  ,  'AAAA'],
   [5.429000 ,   6.0000    ,  'MMMM'],
   [6.0000   ,   7.060000  ,  'AAAA'],
   [7.060000 ,   8.411000  ,  'MMMM'],
   [8.411000 ,   8.971000  ,  'MMMM'],
   [8.971000 ,   11.00     ,  'abcd'],
   [11.00    ,   13.40600  ,  'MMMM'],
   [13.40600 ,   13.82700  ,  'Zero'],
   [13.82700 ,   15.935554 ,  'One'],]

truth = {}
for mi,ma,key in raw_truth:
  truth.setdefault((mi,ma), key)

test = [ (mi,ma,ma - mi,lab) for mi,ma,lab in raw_test ]

overlap = []
overlap.append(["test-min","test-max","test-size","test-lab",
                "#","truth-min","truth-max","truth-lab",
                "#","min-over","max-over","over-size","%"])

for mi,ma,siz,lab in test:
  for key in truth:
    truMi,truMa = key
    truVal = truth[key]

    if  ma >= truMi and ma <=truMa or mi >= truMi and mi <=truMa: # coarse filter
      minOv = max(truMi,mi)
      maxOv = min(truMa,ma)
      sizOv = maxOv-minOv
      perc = sizOv/(siz/100.0)
      if perc > 0: # fine filter
        overlap.append([mi,ma,siz,lab,
                        '#',truMi,truMa,truVal,
                        '#',minOv,maxOv, sizOv, perc ])

# just some printing:    
print(truth)
print()    

print(test)
print()    

for d in overlap:
  for x in d:
    if type(x) is str:
      if x == '#':
        print( '  |  ', end ="")    
       else:
        print( '{:<10}'.format(x), end ="")  
    else:
      print( '{:<10.5f}'.format(x), end ="")
  print(" %")

# the print statements are python3 - at the time this answer was written, the question
# had no python 2 tag. Replace the python 3 print statements with
#    print '  |  ',
#    print '{:<10}'.format(x),  
#    print '{:<10.5f}'.format(x),    
# etc. or adapt them accordingly - see https://stackoverflow.com/a/2456292/7505395


Output:

test-min  test-max  test-size test-lab    |  truth-min truth-max truth-lab   |  min-over  max-over  over-size %          %
0.00000   3.81000   3.81000   Three       |  0.00000   1.00000   MMMM        |  0.00000   1.00000   1.00000   26.24672   %
0.00000   3.81000   3.81000   Three       |  1.00000   3.81000   Three       |  1.00000   3.81000   2.81000   73.75328   %
3.81000   3.91092   0.10092   Three       |  3.81000   3.91092   NNNN        |  3.81000   3.91092   0.10092   100.00000  %
3.91092   5.42900   1.51808   AAAA        |  3.91092   5.42900   AAAA        |  3.91092   5.42900   1.51808   100.00000  %
5.42900   7.06000   1.63100   Three       |  5.42900   6.00000   MMMM        |  5.42900   6.00000   0.57100   35.00920   %
5.42900   7.06000   1.63100   Three       |  6.00000   7.06000   AAAA        |  6.00000   7.06000   1.06000   64.99080   %
7.06000   8.41100   1.35100   Three       |  7.06000   8.41100   MMMM        |  7.06000   8.41100   1.35100   100.00000  %
8.41100   8.97100   0.56000   Zero        |  8.41100   8.97100   MMMM        |  8.41100   8.97100   0.56000   100.00000  %
8.97100   13.40600  4.43500   Three       |  8.97100   11.00000  abcd        |  8.97100   11.00000  2.02900   45.74972   %
8.97100   13.40600  4.43500   Three       |  11.00000  13.40600  MMMM        |  11.00000  13.40600  2.40600   54.25028   %
13.40600  13.82700  0.42100   Zero        |  13.40600  13.82700  Zero        |  13.40600  13.82700  0.42100   100.00000  %
13.82700  15.93555  2.10855   Two         |  13.82700  15.93555  One         |  13.82700  15.93555  2.10855   100.00000  %


Disclaimer: I haven't number crunched everything by hand to check this is correct - just took a glance at the output. Verify it yourself. You would need to apply the truth-lab where ever your % fits.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-01-14 18:27
              
            
            
                                                                       
Assuming that the ranges never overlap, that they're ordered, and that the smaller ranges inside test will always fit fully inside the larger ranges of truth.

You can perform a merge similar to the merge in merge sort. Here's a code snippet that should do what you like:

def in_range(truth_item, test_item):
    return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]


def update_test_items(truth_items, test_items):
    current_truth_index = 0
    for test_item in test_items:
        while not in_range(truth_items[current_truth_index], test_item):
            current_truth_index += 1
            if current_truth_index >= len(truth_items):
                return

        test_item[2] = truth_items[current_truth_index][2]


update_test_items(truth, test)


Calling update_test_items will modify test by adding in the appropriate values from truth.

Now you can set a condition for update if you like, say 80% coverage and leave the value unchanged if this isn't met.

def has_enough_coverage(truth_item, test_item):
    truth_item_size = truth_item[1] - truth_item[0]
    test_item_size = test_item[1] - test_item[0]
    return test_item_size / truth_item_size >= .8


def in_range(truth_item, test_item):
    return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]


def update_test_items(truth_items, test_items):
    current_truth_index = 0
    for test_item in test_items:
        while not in_range(truth_items[current_truth_index], test_item):
            current_truth_index += 1
            if current_truth_index >= len(truth_items):
                return

        if has_enough_coverage(truth_items[current_truth_index], test_item):
            test_item[2] = truth_items[current_truth_index][2]


update_test_items(truth, test)


This will only update the test item if it covers 80%+ of the truth range.

Note that these will only work if the initial assumptions are correct, otherwise you'll run into issues. This approach will also run very efficiently O(N) time.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-01-14 18:41
              
            
            
                                                                       
I am not sure I fully understand your question but if you are referring to what I think you are, then you need to worry about "out of bounds" and the fact that "truth" and test won`t have the same correspondence in j - as you mentioned.

A way around that would be to use two different indices for truth[j] and test[k] (or whatever you want to call it). You could obviously use two loops to continuously iterate over the whole test, but that wouldn`t make the code efficient.

I would suggest using the second index as a counter that continuously goes up by 1 (think of it as a while loop that is while "value test[k] in range of value truth[j] and do what you are currently doing.

Whenever you reached a point that test[k] value is over the range of your current truth[j] you continue to the next j (value interval in truth).

Hope that helps and makes sense



l_truth = len(truth)
l_test = len(test)

count = 0

res = []

for j in range(l_truth):
    count2= count
    for k in range(count2,l_test):
        if truth[j][2]== 'MMMM': 
            min_truth = truth[j][0]
            max_truth = truth[j][1]
            min_test = test[k][0]
            max_test = test[k][1]

            #diff_truth = max_truth - min_truth
            diff_test = max_test - min_test

            if (min_truth <= min_test) and (max_truth >= max_test):
                res.append((test[k][0], test[k][1],truth[j][2]))
                count +=1
            elif (min_truth <= min_test) and (max_truth <= max_test):
                #diff_min = min_truth - min_test
                diff_max = max_test - max_truth
                ratio = diff_max/diff_test
                if ratio <= 0.2:
                    res.append((test[k][0], test[k][1],truth[j][2]))
                    count +=1
            elif (min_truth >= min_test) and (max_truth >= max_test):
                diff_min = min_truth - min_test
                #diff_max = max_test - max_truth
                ratio = diff_min/diff_test
                if ratio <= 0.2:
                    res.append((test[k][0], test[k][1],truth[j][2]))
                    count+=1
            elif (min_truth >= min_test) and (max_truth <= max_test):
                diff_min = min_truth - min_test
                diff_max = max_test - max_truth
                ratio = (diff_min+diff_max)/diff_test
                if ratio <= 0.2:
                    res.append((test[k][0], test[k][1],truth[j][2]))
                    count+=1
            else:
                pass
        else:
            continue

for i in range(len(res)):
    print res[i]


Check if this works. I actually had to use two loops, but I am sure there are other more efficient ways of doing this.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复