One text file \'Truth\' contains these following values :
0.000000 3.810000 Three
3.810000 3.910923 NNNN
3.910923 5.429000 AAAA
5.429000
Assuming that the ranges never overlap, that they're ordered, and that the smaller ranges inside test will always fit fully inside the larger ranges of truth.
You can perform a merge similar to the merge in merge sort. Here's a code snippet that should do what you like:
def in_range(truth_item, test_item):
return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]
def update_test_items(truth_items, test_items):
current_truth_index = 0
for test_item in test_items:
while not in_range(truth_items[current_truth_index], test_item):
current_truth_index += 1
if current_truth_index >= len(truth_items):
return
test_item[2] = truth_items[current_truth_index][2]
update_test_items(truth, test)
Calling update_test_items will modify test by adding in the appropriate values from truth.
Now you can set a condition for update if you like, say 80% coverage and leave the value unchanged if this isn't met.
def has_enough_coverage(truth_item, test_item):
truth_item_size = truth_item[1] - truth_item[0]
test_item_size = test_item[1] - test_item[0]
return test_item_size / truth_item_size >= .8
def in_range(truth_item, test_item):
return truth_item[0] <= test_item[0] and truth_item[1] >= test_item[1]
def update_test_items(truth_items, test_items):
current_truth_index = 0
for test_item in test_items:
while not in_range(truth_items[current_truth_index], test_item):
current_truth_index += 1
if current_truth_index >= len(truth_items):
return
if has_enough_coverage(truth_items[current_truth_index], test_item):
test_item[2] = truth_items[current_truth_index][2]
update_test_items(truth, test)
This will only update the test item if it covers 80%+ of the truth range.
Note that these will only work if the initial assumptions are correct, otherwise you'll run into issues. This approach will also run very efficiently O(N) time.