Assign a number to each unique value in a list

前端 未结 8 1283
名媛妹妹
名媛妹妹 2020-12-03 03:28

I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in o

相关标签:
8条回答
  • 2020-12-03 04:03

    Without using an external library (check the EDIT for a Pandas solution) you can do it as follows :

    d = {ni: indi for indi, ni in enumerate(set(names))}
    numbers = [d[ni] for ni in names]
    

    Brief explanation:

    In the first line, you assign a number to each unique element in your list (stored in the dictionary d; you can easily create it using a dictionary comprehension; set returns the unique elements of names).

    Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers.

    One example to illustrate that it also works fine for unsorted lists:

    # 'll' appears all over the place
    names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']
    

    That is the output for numbers:

    [1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]
    

    As you can see, the number 1 associated with ll appears at the correct places.

    EDIT

    If you have Pandas available, you can also use pandas.factorize (which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):

    import pandas as pd
    
    pd.factorize(names)
    

    will then return

    (array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
     array(['ll', 'hl', 'LL', 'HL'], dtype=object))
    

    Therefore,

    numbers = pd.factorize(names)[0]
    
    0 讨论(0)
  • 2020-12-03 04:05

    I managed to modify your script very slightly and it looks ok:

    names = ['ll', 'hl', 'll', 'hl', 'LL', 'll', 'LL', 'HL', 'hl', 'HL', 'LL', 'HL', 'zzz']
    names.sort()
    print(names)
    numbers = []
    num = 0
    for item in range(len(names)):
        if item == len(names) - 1:
          break
        elif names[item] == names[item+1]:
            numbers.append(num)
        else:
            numbers.append(num)
            num = num + 1
    numbers.append(num)
    print(numbers)
    

    You can see it is very simmilar, only thing is that instead adding number for NEXT element i add number for CURRENT element. That's all. Oh, and sorting. It sorts capital first, then lowercase in this example, you can play with sort(key= lambda:x ...) if you wish to change that. (Perhaps like this: names.sort(key = lambda x: (x.upper() if x.lower() == x else x.lower())) )

    0 讨论(0)
  • 2020-12-03 04:12

    If the condition is that the numbers are unique and the exact number is not important, then you can build a mapping relating each item in the list to a unique number on the fly, assigning values from a count object:

    from itertools import count
    
    names = ['ll', 'll', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'll']
    
    d = {}
    c = count()
    numbers = [d.setdefault(i, next(c)) for i in names]
    print(numbers)
    # [0, 0, 2, 2, 4, 4, 4, 7, 0]
    

    You could do away with the extra names by using map on the list and a count object, and setting the map function as {}.setdefault (see @StefanPochmann's comment):

    from itertools import count
    
    names = ['ll', 'll', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'll']
    numbers  = map({}.setdefault, names, count()) # call list() on map for Py3
    print(numbers)
    # [0, 0, 2, 2, 4, 4, 4, 7, 0]
    

    As an extra, you could also use np.unique, in case you already have numpy installed:

    import numpy as np
    
    _, numbers = np.unique(names, return_inverse=True)
    print(numbers)
    # [3 3 2 2 1 1 1 0 3]
    
    0 讨论(0)
  • 2020-12-03 04:14

    If you have k different values, this maps them to integers 0 to k-1 in order of first appearance:

    >>> names = ['b', 'c', 'd', 'c', 'b', 'a', 'b']
    >>> tmp = {}
    >>> [tmp.setdefault(name, len(tmp)) for name in names]
    [0, 1, 2, 1, 0, 3, 0]
    
    0 讨论(0)
  • 2020-12-03 04:14

    To make it more generic you can wrap it in a function, so these hard-coded values don't do any harm, because they are local.

    If you use efficient lookup-containers (I'll use a plain dictionary) you can keep the first index of each string without loosing to much performance:

    def your_function(list_of_strings):
    
        encountered_strings = {}
        result = []
    
        idx = 0
        for astring in list_of_strings:
            if astring in encountered_strings:  # check if you already seen this string
                result.append(encountered_strings[astring])
            else:
                encountered_strings[astring] = idx
                result.append(idx)
                idx += 1
        return result
    

    And this will assign the indices in order (even if that's not important):

    >>> your_function(['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL'])
    [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
    

    This needs only one iteration over your list of strings, which makes it possible to even process generators and similar.

    0 讨论(0)
  • 2020-12-03 04:21

    Since you are mapping strings to integers, that suggests using a dict. So you can do the following:

    d = dict()
    
    counter = 0
    
    for name in names:
        if name in d:
            continue
        d[name] = counter
        counter += 1
    
    numbers = [d[name] for name in names]
    
    0 讨论(0)
提交回复
热议问题