Splitting list into smaller lists of equal values

前端未结

关注

 4  705

I am looking to transform a list into smaller lists of equal values. An example I have is:

[\"a\", \"a\", \"a\", \"b\", \"b\", \"c\", \"c\", \"c\", \"c\"] 
<


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-15 18:48
              
            
            
                                                                       
You could use collections.Counter

>>> lst = ["a", "a", "a", "b", "b", "c", "c", "c", "c"]
>>> import collections
>>> collections.Counter(lst).most_common()
[('c', 4), ('a', 3), ('b', 2)]


This works even when the values are not ordered and provides a very compact representation which then you could expand if needed into lists:

>>> [[i]*n for i,n in collections.Counter(lst).most_common()]
[['c', 'c', 'c', 'c'], ['a', 'a', 'a'], ['b', 'b']]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-01-15 18:51
              
            
            
                                                                       
You could use itertools.groupby to solve the problem:

>>> from itertools import groupby
>>> [list(grp) for k, grp in groupby(["a", "a", "a", "b", "b", "c", "c", "c", "c"])]
[['a', 'a', 'a'], ['b', 'b'], ['c', 'c', 'c', 'c']]


It only groups consecutive equal elements but that seems enough in your case.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2021-01-15 19:03
              
            
            
                                                                       
Another manner to have your desired output is by using defaultdict from collections module (best time using this approach was: ~= 0.02s same as using groupby):

from collections import defaultdict
a = ["a", "a", "a", "b", "b", "c", "c", "c", "c"]
b = defaultdict(list)
for k in a:
    b[k].append(k)

>>> b 
defaultdict(list,
            {'a': ['a', 'a', 'a'], 'b': ['b', 'b'], 'c': ['c', 'c', 'c', 'c']})


So, what you have to do now is:

list(b.values())
>>> [['a', 'a', 'a'], ['b', 'b'], ['c', 'c', 'c', 'c']]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-01-15 19:05
              
            
            
                                                                       
While I'd personally go with itertools.groupby as the most convinient way, you've asked for efficiency and this should be considerably faster than any of the itertools options:

data = ["a", "a", "a", "b", "b", "c", "c", "c", "c"] 

lookup = {}  # lookup map
result = []
for element in data:
    if element not in lookup:
        target = lookup[element] = [element]
        result.append(target)
    else:
        lookup[element].append(element)

print(result)
# [['a', 'a', 'a'], ['b', 'b'], ['c', 'c', 'c', 'c']]


If the data is always ordered (i.e. elements don't mix) this can be further optimized without a lookup table and using a list comprehension for maximum performance.

UPDATE - Some clarification on efficiency and operation. If you set up your test as:

from itertools import groupby

def itools_func(data):
    return [list(grp) for k, grp in groupby(data)]

def manual_func(data):
    lookup = {}
    result = []
    for element in data:
        if element not in lookup:
            target = lookup[element] = [element]
            result.append(target)
        else:
            lookup[element].append(element)
    return result


The problem is that those two will not return the same values:

test_data = ["a", "a", "b", "c", "c", "b", "a"]

itools_func(test_data)  # [['a', 'a'], ['b'], ['c', 'c'], ['b'], ['a']]
manual_func(test_data)  # [['a', 'a', 'a'], ['b', 'b'], ['c', 'c']]


From OP's question, I understood he wants the latter one (based on his comment "I sorted the list to make values consecutive") because with a sorted list this can be done far easier. So, if we feed those functions a really long list:

test_data = ["a", "a", "a", "b", "b", "c", "c", "c", "c"] * 10000  # 10000 x the original


On my system it clocks as follows:

itools_func - 100 loops: 2.668s, per loop: 26.68ms
manual_func - 100 loops: 1.005s, per loop: 10.05ms


But, this is an unfavorable setting for itertools.groopby. If the data was to be sorted like:

test_data = ["a"] * 3000 + ["b"] * 2000 + ["c"] * 40000


The story is quite a bit different as the C backend kicks in:

itools_func - 1000 loops: 656.3ms, per loop: 656.3µs
manual_func - 1000 loops: 4.816s, per loop: 4.816ms


When the data is sorted the manual function can be further optimized but it will hardly beat what itertools does under the hood.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复