Indexing a list with an unique index

后端 未结 6 599
清歌不尽
清歌不尽 2020-11-30 08:52

I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain \"index\" to get [1,1,2,3,1,2].

This is my code:

相关标签:
6条回答
  • 2020-11-30 09:03

    Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.

    To make it O(n), get rid of index() and store indexes in a dictionary:

    >>> idx, indexes = 1, {}
    >>> for x in l:
    ...     if x not in indexes:
    ...         indexes[x] = idx
    ...         idx += 1
    ... 
    >>> [indexes[x] for x in l]
    [1, 1, 2, 3, 1, 2]
    

    If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.

    0 讨论(0)
  • 2020-11-30 09:04

    Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:

        [1,1,2,3,1,2]
    

    then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you

        import numpy as np
        l = [10,10,20,15,10,20]
        a = np.array(l)
        x,y = np.unique(a,return_inverse = True)
    

    and for this example the output of y is:

        y = [0,0,2,1,0,2]
    

    I tested this for 1,000,000 entries and it was done essentially immediately.

    0 讨论(0)
  • 2020-11-30 09:13

    You can do this in O(N) time using a defaultdict and a list comprehension:

    >>> from itertools import count
    >>> from collections import defaultdict
    >>> lst = [10, 10, 20, 15, 10, 20]
    >>> d = defaultdict(count(1).next)
    >>> [d[k] for k in lst]
    [1, 1, 2, 3, 1, 2]
    

    In Python 3 use __next__ instead of next.


    If you're wondering how it works?

    The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.

    d at the end will look like this:

    >>> d
    defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
                {10: 1, 20: 2, 15: 3})
    
    0 讨论(0)
  • 2020-11-30 09:16

    You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:

    >>> from collections import OrderedDict
    >>> from operator import itemgetter
    >>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
    (1, 1, 2, 3, 1, 2)
    
    0 讨论(0)
  • 2020-11-30 09:19

    The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.

    The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.

    Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.

    res = []
    conversion = {}
    i = 0
    for x in l:
        if x not in conversion:
            value = conversion[x] = i
            i += 1
        else:
            value = conversion[x]
        res.append(value)
    
    0 讨论(0)
  • 2020-11-30 09:29

    For completness, you can also do it eagerly:

    from itertools import count
    
    wordid = dict(zip(set(list_), count(1)))
    

    This uses a set to obtain the unique words in list_, pairs each of those unique words with the next value from count() (which counts upwards), and constructs a dictionary from the results.

    Original answer, written by nneonneo.

    0 讨论(0)
提交回复
热议问题