How to use bisect.insort_left with a key?

后端 未结 4 754
时光说笑
时光说笑 2020-12-09 14:34

Doc\'s are lacking an example...How do you use bisect.insort_left)_ based on a key?

Trying to insert based on key.

bisect.insort_left(da         


        
相关标签:
4条回答
  • 2020-12-09 15:17

    This does essentially the same thing the SortedCollection recipe does that the bisect documentation mentions in the See also: section at the end which supports a key-function.

    What's being done is a separate sorted keys list is maintained in parallel with the sorted data list to improve performance (it's faster than creating the keys list before each insertion, but keeping it around and updating it isn't strictly required). The ActiveState recipe encapsulated this for you within a class, but in the code below they're just two separate independent lists being passed around (so it'd be easier for them to get out of sync than it would be if they were both held in an instance of the recipe's class).

    from bisect import bisect_left
    
    def insert(seq, keys, item, keyfunc=lambda v: v):
        """Insert an item into a sorted list using a separate corresponding
           sorted keys list and a keyfunc() to extract the key from each item.
    
        Based on insert() method in SortedCollection recipe:
        http://code.activestate.com/recipes/577197-sortedcollection/
        """
        k = keyfunc(item)  # Get key.
        i = bisect_left(keys, k)  # Determine where to insert item.
        keys.insert(i, k)  # Insert key of item to keys list.
        seq.insert(i, item)  # Insert the item itself in the corresponding place.
    
    # Initialize the sorted data and keys lists.
    data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
    data.sort(key=lambda r: r[1]) # Sort data by key value
    keys = [r[1] for r in data]   # Initialize keys list
    print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]
    
    insert(data, keys, ('brown', 7), keyfunc=lambda x: x[1])
    print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]
    

    Follow-on question:
        Can bisect.insort_left be used?

    No, you can't simply use the bisect.insort_left() function to do this because it wasn't written in a way that supports a key-function—instead it just compares the whole item passed to it to insert, x, with one of the whole items in the array in its if a[mid] < x: statement. You can see what I mean by looking at the source for the bisect module in Lib/bisect.py.

    Here's the relevant excerpt:

    def insort_left(a, x, lo=0, hi=None):
        """Insert item x in list a, and keep it sorted assuming a is sorted.
    
        If x is already in a, insert it to the left of the leftmost x.
    
        Optional args lo (default 0) and hi (default len(a)) bound the
        slice of a to be searched.
        """
    
        if lo < 0:
            raise ValueError('lo must be non-negative')
        if hi is None:
            hi = len(a)
        while lo < hi:
            mid = (lo+hi)//2
            if a[mid] < x: lo = mid+1
            else: hi = mid
        a.insert(lo, x)
    

    You could modify the above to accept an optional key-function argument and use it:

    def my_insort_left(a, x, lo=0, hi=None, keyfunc=lambda v: v):
        x_key = keyfunc(x)  # Get comparison value.
        . . .
            if keyfunc(a[mid]) < x_key: # Compare key values.
                lo = mid+1
        . . .
    

    ...and call it like this:

    my_insort_left(data, ('brown', 7), keyfunc=lambda v: v[1])
    

    Actually, if you're going to write a custom function, for the sake of more efficiency at the expense of unneeded generality, you could dispense with the adding of a generic key function argument and just hardcode everything to operate the way needed with the data format you have. This will avoid the overhead of repeated calls to a key-function while doing the insertions.

    def my_insort_left(a, x, lo=0, hi=None):
        x_key = x[1]   # Key on second element of each item in sequence.
        . . .
            if a[mid][1] < x_key: lo = mid+1  # Compare second element to key.
        . . .
    

    ...called this way without passing keyfunc:

    my_insort_left(data, ('brown', 7))
    
    0 讨论(0)
  • 2020-12-09 15:33

    You could wrap your iterable in a class that implements __getitem__ and __len__. This allows you the opportunity to use a key with bisect_left. If you set up your class to take the iterable and a key function as arguments.

    To extend this to be usable with insort_left it's required to implement the insert method. The problem here is that if you do that is that insort_left will try to insert your key argument into the list containing the objects of which the the key is a member.

    An example is clearer

    from bisect import bisect_left, insort_left
    
    
    class KeyWrapper:
        def __init__(self, iterable, key):
            self.it = iterable
            self.key = key
    
        def __getitem__(self, i):
            return self.key(self.it[i])
    
        def __len__(self):
            return len(self.it)
    
        def insert(self, index, item):
            print('asked to insert %s at index%d' % (item, index))
            self.it.insert(index, {"time":item})
    
    timetable = [{"time": "0150"}, {"time": "0250"}, {"time": "0350"}, {"time": "0450"}, {"time": "0550"}, {"time": "0650"}, {"time": "0750"}]
    
    bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
    
    islindex = insort_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
    

    See how in my insert method I had to make it specific to the timetable dictionary otherwise insort_left would try insert "0359" where it should insert {"time": "0359"}?

    Ways round this could be to construct a dummy object for the comparison, inherit from KeyWrapper and override insert or pass some sort of factory function to create the object. None of these ways are particularly desirable from an idiomatic python point of view.

    So the easiest way is to just use the KeyWrapper with bisect_left, which returns you the insert index and then do the insert yourself. You could easily wrap this in a dedicated function.

    e.g.

    bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
    timetable.insert(bslindex, {"time":"0359"})
    

    In this case ensure you don't implement insert, so you will be immediately aware if you accidentally pass a KeyWrapper to a mutating function like insort_left which probably wouldn't do the right thing.

    To use your example data

    from bisect import bisect_left
    
    
    class KeyWrapper:
        def __init__(self, iterable, key):
            self.it = iterable
            self.key = key
    
        def __getitem__(self, i):
            return self.key(self.it[i])
    
        def __len__(self):
            return len(self.it)
    
    data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
    data.sort(key=lambda c: c[1])
    
    newcol = ('brown', 7)
    
    bslindex = bisect_left(KeyWrapper(data, key=lambda c: c[1]), newcol[1])
    data.insert(bslindex, newcol)
    
    print(data)
    
    0 讨论(0)
  • 2020-12-09 15:34

    Add comparison methods to your class

    Sometimes this is the least painful way, especially if you already have a class and just want to sort by a key from it:

    #!/usr/bin/env python3
    
    import bisect
    import functools
    
    @functools.total_ordering
    class MyData:
        def __init__(self, color, number):
            self.color = color
            self.number = number
        def __lt__(self, other):
            return self.number < other.number
        def __str__(self):
            return '{} {}'.format(self.color, self.number)
    
    mydatas = [
        MyData('red', 5),
        MyData('blue', 1),
        MyData('yellow', 8),
        MyData('black', 0),
    ]
    mydatas_sorted = []
    for mydata in mydatas:
        bisect.insort(mydatas_sorted, mydata)
    for mydata in mydatas_sorted:
        print(mydata)
    

    Output:

    black 0
    blue 1
    red 5
    yellow 8
    

    See also: "Enabling" comparison for classes

    Tested in Python 3.5.2.

    Upstream requests/patches

    I get the feeling this is going to happen sooner or later ;-)

    • https://github.com/python/cpython/pull/13970
    • https://bugs.python.org/issue4356
    0 讨论(0)
  • 2020-12-09 15:40

    If your goal is to mantain a list sorted by key, performing usual operations like bisect insert, delete and update, I think sortedcontainers should suit your needs as well, and you'll avoid O(n) inserts.

    0 讨论(0)
提交回复
热议问题