conditionally replace values in one list using another list of different length and ranges based on %age overlap in python

后端 未结 3 425
予麋鹿
予麋鹿 2021-01-14 17:47

One text file \'Truth\' contains these following values :

0.000000    3.810000    Three
3.810000    3.910923    NNNN
3.910923    5.429000    AAAA
5.429000            


        
3条回答
  •  伪装坚强ぢ
    2021-01-14 18:27

    Assuming that the ranges never overlap, that they're ordered, and that the smaller ranges inside test will always fit fully inside the larger ranges of truth.

    You can perform a merge similar to the merge in merge sort. Here's a code snippet that should do what you like:

    def in_range(truth_item, test_item):
        return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]
    
    
    def update_test_items(truth_items, test_items):
        current_truth_index = 0
        for test_item in test_items:
            while not in_range(truth_items[current_truth_index], test_item):
                current_truth_index += 1
                if current_truth_index >= len(truth_items):
                    return
    
            test_item[2] = truth_items[current_truth_index][2]
    
    
    update_test_items(truth, test)
    

    Calling update_test_items will modify test by adding in the appropriate values from truth.

    Now you can set a condition for update if you like, say 80% coverage and leave the value unchanged if this isn't met.

    def has_enough_coverage(truth_item, test_item):
        truth_item_size = truth_item[1] - truth_item[0]
        test_item_size = test_item[1] - test_item[0]
        return test_item_size / truth_item_size >= .8
    
    
    def in_range(truth_item, test_item):
        return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]
    
    
    def update_test_items(truth_items, test_items):
        current_truth_index = 0
        for test_item in test_items:
            while not in_range(truth_items[current_truth_index], test_item):
                current_truth_index += 1
                if current_truth_index >= len(truth_items):
                    return
    
            if has_enough_coverage(truth_items[current_truth_index], test_item):
                test_item[2] = truth_items[current_truth_index][2]
    
    
    update_test_items(truth, test)
    

    This will only update the test item if it covers 80%+ of the truth range.

    Note that these will only work if the initial assumptions are correct, otherwise you'll run into issues. This approach will also run very efficiently O(N) time.

提交回复
热议问题