How to retrieve an element from a set without removing it?

前端 未结 14 2566
孤城傲影
孤城傲影 2020-12-07 07:17

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()

相关标签:
14条回答
  • 2020-12-07 07:41

    Since you want a random element, this will also work:

    >>> import random
    >>> s = set([1,2,3])
    >>> random.sample(s, 1)
    [2]
    

    The documentation doesn't seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn't random; the order is undefined but predictable:

    >>> list(set(range(10))) == range(10)
    True 
    

    If randomness is important and you need a bunch of elements in constant time (large sets), I'd use random.sample and convert to a list first:

    >>> lst = list(s) # once, O(len(s))?
    ...
    >>> e = random.sample(lst, 1)[0] # constant time
    
    0 讨论(0)
  • 2020-12-07 07:45

    I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.

    def anyitem(iterable):
        try:
            return iter(iterable).next()
        except StopIteration:
            return None
    
    0 讨论(0)
  • 2020-12-07 07:45

    You can unpack the values to access the elements:

    s = set([1, 2, 3])
    
    v1, v2, v3 = s
    
    print(v1,v2,v3)
    #1 2 3
    
    0 讨论(0)
  • 2020-12-07 07:47

    Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):

    e,*_=s
    

    With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):

    [*s][0]
    

    Both options are roughly 1000 times slower on my machine than the for-loop method.

    0 讨论(0)
  • 2020-12-07 07:48

    To provide some timing figures behind the different approaches, consider the following code. The get() is my custom addition to Python's setobject.c, being just a pop() without removing the element.

    from timeit import *
    
    stats = ["for i in xrange(1000): iter(s).next()   ",
             "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
             "for i in xrange(1000): s.add(s.pop())   ",
             "for i in xrange(1000): s.get()          "]
    
    for stat in stats:
        t = Timer(stat, setup="s=set(range(100))")
        try:
            print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
        except:
            t.print_exc()
    

    The output is:

    $ ./test_get.py
    Time for for i in xrange(1000): iter(s).next()   :       0.433080
    Time for for i in xrange(1000):
            for x in s:
                    break:   0.148695
    Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
    Time for for i in xrange(1000): s.get()          :       0.146673
    

    This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).

    0 讨论(0)
  • 2020-12-07 07:49

    I wondered how the functions will perform for different sets, so I did a benchmark:

    from random import sample
    
    def ForLoop(s):
        for e in s:
            break
        return e
    
    def IterNext(s):
        return next(iter(s))
    
    def ListIndex(s):
        return list(s)[0]
    
    def PopAdd(s):
        e = s.pop()
        s.add(e)
        return e
    
    def RandomSample(s):
        return sample(s, 1)
    
    def SetUnpacking(s):
        e, *_ = s
        return e
    
    from simple_benchmark import benchmark
    
    b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
                  {2**i: set(range(2**i)) for i in range(1, 20)},
                  argument_name='set size',
                  function_aliases={first: 'First'})
    
    b.plot()
    

    This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

    However as long as one of the constant time approaches is used the performance difference will be negligible.


    iteration_utilities (Disclaimer: I'm the author) contains a convenience function for this use-case: first:

    >>> from iteration_utilities import first
    >>> first({1,2,3,4})
    1
    

    I also included it in the benchmark above. It can compete with the other two "fast" solutions but the difference isn't much either way.

    0 讨论(0)
提交回复
热议问题