问题
I have a list of item, where I want to remove the occurrence of any duplicates for one item, but keep any duplicates for the rest. I.e. I start with the following list
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]
I want to remove any duplicates of 0
but keep the duplicates of 1
and 9
.
My current solution is the following:
mylist = [i for i in mylist if i != 0]
mylist.add(0)
Is there a nice way of keeping one occurrence of 0
besides the following?
for i in mylist:
if mylist.count(0) > 1:
mylist.remove(0)
The second approach takes more than double the time for this example.
Clarification:
currently, I don't care about the order of items in the list, as I currently sort it after it has been created and cleaned, but that might change later.
currently, I only need to remove duplicates for one specific item (that is
0
in my example)
回答1:
The solution:
[0] + [i for i in mylist if i]
looks good enough, except if 0
is not in mylist
, in which case you're wrongly adding 0.
Besides, adding 2 lists like this isn't very good performance wise. I'd do:
newlist = [i for i in mylist if i]
if len(newlist) != len(mylist): # 0 was removed, add it back
newlist.append(0)
(or using filter newlist = list(filter(None,mylist))
which could be slightly faster because there are no native python loops)
Appending to a list at the last position is very efficient (list
object uses pre-allocation and most of the time no memory is copied). The length test trick is O(1)
and allows to avoid to test 0 in mylist
回答2:
It sounds like a better data structure for you to use would be collections.Counter (which is in the standard library):
import collections
counts = collections.Counter(mylist)
counts[0] = 1
mylist = list(counts.elements())
回答3:
Here is a generator-based approach with approximately O(n) complexity that also preserves the order of the original list:
In [62]: def remove_dup(lst, item):
...: temp = [item]
...: for i in lst:
...: if i != item:
...: yield i
...: elif i == item and temp:
...: yield temp.pop()
...:
In [63]: list(remove_dup(mylist, 0))
Out[63]: [4, 1, 2, 6, 1, 0, 9, 8, 9]
Also if you are dealing with larger lists you can use following vectorized and optimized approach using Numpy:
In [80]: arr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9])
In [81]: mask = arr == 0
In [82]: first_ind = np.where(mask)[0][0]
In [83]: mask[first_ind] = False
In [84]: arr[~mask]
Out[84]: array([4, 1, 2, 6, 1, 0, 9, 8, 9])
回答4:
If performance is an issue and you are happy to use a 3rd party library, use numpy
.
Python standard library is great for many things. Computations on numeric arrays is not one of them.
import numpy as np
mylist = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9])
mylist = np.delete(mylist, np.where(mylist == 0)[0][1:])
# array([4, 1, 2, 6, 1, 0, 9, 8, 9])
Here the first argument of np.delete
is the input array. The second argument extracts the indices of all occurrences of 0, then extracts the second instance onwards.
Performance benchmarking
Tested on Python 3.6.2 / Numpy 1.13.1. Performance will be system and array specific.
%timeit jp(myarr.copy()) # 183 µs
%timeit vui(mylist.copy()) # 393 µs
%timeit original(mylist.copy()) # 1.85 s
import numpy as np
from collections import Counter
myarr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000)
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000
def jp(myarr):
return np.delete(myarr, np.where(myarr == 0)[0][1:])
def vui(mylist):
return [0] + list(filter(None, mylist))
def original(mylist):
for i in mylist:
if mylist.count(0) > 1:
mylist.remove(0)
return mylist
回答5:
Slicing should do
a[start:end] # items start through end-1
a[start:] # items start through the rest of the list
a[:end] # items from the beginning through end-1
a[:] # a copy of the whole list
Input:
mylist = [4,1, 2, 6, 1, 0, 9, 8, 0, 9,0,0,9,2,2,]
pos=mylist.index(0)
nl=mylist[:pos+1]+[i for i in mylist[pos+1:] if i!=0]
print(nl)
Output:[4, 1, 2, 6, 1, 0, 9, 8, 9, 9, 2, 2]
回答6:
You can use this:
desired_value = 0
mylist = [i for i in mylist if i!=desired_value] + [desired_value]
You can now change your desired value, you can also make it as a list like this
desired_value = [0, 6]
mylist = [i for i in mylist if i not in desired_value] + desired_value
回答7:
Maybe you can use a filter
.
[0] + list(filter(lambda x: x != 0, mylist))
回答8:
You can use an itertools.count counter which will return 0, 1, ... each time it is iterated on:
from itertools import count
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]
counter = count()
# next(counter) will be called each time i == 0
# it will return 0 the first time, so only the first time
# will 'not next(counter)' be True
out = [i for i in mylist if i != 0 or not next(counter)]
print(out)
# [4, 1, 2, 6, 1, 0, 9, 8, 9]
The order is kept, and it can be easily modified to deduplicate an arbitrary number of values:
from itertools import count
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]
items_to_dedup = {1, 0}
counter = {item: count() for item in items_to_dedup}
out = [i for i in mylist if i not in items_to_dedup or not next(counter[i])]
print(out)
# [4, 1, 2, 6, 0, 9, 8, 9]
回答9:
here's on line for it: where m
is number to be occured once,and the order is kept
[x for i,x in enumerate(mylist) if mylist.index(x)==i or x!=m]
Result
[4, 1, 2, 6, 1, 0, 9, 8, 9]
来源:https://stackoverflow.com/questions/49707401/python-remove-duplicates-for-a-specific-item-from-list