Remove empty strings from a list of strings

后端未结

关注

 12  1498

I want to remove all empty strings from a list of strings in python.

My idea looks like this:

while \'\' in str_list:
    str_list.remove(\'\')


                      
              相关标签:


      
      
        
          12条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-11-22 05:11
              
            
            
                                                                       
Keep in mind that if you want to keep the white spaces within a string, you may remove them unintentionally using some approaches.
If you have this list

['hello world', '  ', '', 'hello'] 
what you may want ['hello world','hello']

first trim the list to convert any type of white space to empty string:

space_to_empty = [x.strip() for x in _text_list]


then remove empty string from them list

space_clean_list = [x for x in space_to_empty if x]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-11-22 05:11
              
            
            
                                                                       
As reported by Aziz Alto filter(None, lstr) does not remove empty strings with a space ' ' but if you are sure lstr contains only string you can use filter(str.strip, lstr)

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']
>>> ' '.join(lstr).split()
['hello', 'world']
>>> filter(str.strip, lstr)
['hello', 'world']


Compare time on my pc

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.356455087661743
>>> timeit('filter(str.strip, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
5.276503801345825


The fastest solution to remove '' and empty strings with a space ' ' remains ' '.join(lstr).split().

As reported in a comment the situation is different if your strings contain spaces.

>>> lstr = ['hello', '', ' ', 'world', '    ', 'see you']
>>> lstr
['hello', '', ' ', 'world', '    ', 'see you']
>>> ' '.join(lstr).split()
['hello', 'world', 'see', 'you']
>>> filter(str.strip, lstr)
['hello', 'world', 'see you']


You can see that filter(str.strip, lstr) preserve strings with spaces on it but ' '.join(lstr).split() will split this strings.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2020-11-22 05:17
              
            
            
                                                                       
Use filter:

newlist=filter(lambda x: len(x)>0, oldlist) 


The drawbacks of using filter as pointed out is that it is slower than alternatives; also, lambda is usually costly. 


Or you can go for the simplest and the most iterative of all:

# I am assuming listtext is the original list containing (possibly) empty items
for item in listtext:
    if item:
        newlist.append(str(item))
# You can remove str() based on the content of your original list


this is the most intuitive of the methods and does it in decent time.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-11-22 05:21
              
            
            
                                                                       
I would use filter:

str_list = filter(None, str_list)
str_list = filter(bool, str_list)
str_list = filter(len, str_list)
str_list = filter(lambda item: item, str_list)


Python 3 returns an iterator from filter, so should be wrapped in a call to list()

str_list = list(filter(None, str_list))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  慢半拍i        
                
              
                            
                2020-11-22 05:22
              
            
            
                                                                       
Sum up best answers:

1. Eliminate emtpties WITHOUT stripping:

That is, all-space strings are retained:

slist = list(filter(None, slist))


PROs:


simplest;
fastest (see benchmarks below).


2. To eliminate empties after stripping ...

2.a ... when strings do NOT contain spaces between words:

slist = ' '.join(slist).split()


PROs:


small code
fast 
(BUT not fastest with big datasets due to memory, contrary to what @paolo-melchiorre results)


2.b ... when strings contain spaces between words?

slist = list(filter(str.strip, slist))


PROs:


fastest;
understandability of the code.


Benchmarks on a 2018 machine:

## Build test-data
#
import random, string
nwords = 10000
maxlen = 30
null_ratio = 0.1
rnd = random.Random(0)                  # deterministic results
words = [' ' * rnd.randint(0, maxlen)
         if rnd.random() > (1 - null_ratio)
         else
         ''.join(random.choices(string.ascii_letters, k=rnd.randint(0, maxlen)))
         for _i in range(nwords)
        ]

## Test functions
#
def nostrip_filter(slist):
    return list(filter(None, slist))

def nostrip_comprehension(slist):
    return [s for s in slist if s]

def strip_filter(slist):
    return list(filter(str.strip, slist))

def strip_filter_map(slist): 
    return list(filter(None, map(str.strip, slist))) 

def strip_filter_comprehension(slist):  # waste memory
    return list(filter(None, [s.strip() for s in slist]))

def strip_filter_generator(slist):
    return list(filter(None, (s.strip() for s in slist)))

def strip_join_split(slist):  # words without(!) spaces
    return ' '.join(slist).split()

## Benchmarks
#
%timeit nostrip_filter(words)
142 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit nostrip_comprehension(words)
263 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter(words)
653 µs ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_map(words)
642 µs ± 36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_comprehension(words)
693 µs ± 42.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_generator(words)
750 µs ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_join_split(words)
796 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2020-11-22 05:28
              
            
            
                                                                       
For a list with a combination of spaces and empty values, use simple list comprehension - 

>>> s = ['I', 'am', 'a', '', 'great', ' ', '', '  ', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', '', 'a', '', 'joke', '', ' ', '', '?', '', '', '', '?']


So, you can see, this list has a combination of spaces and null elements. Using the snippet - 

>>> d = [x for x in s if x.strip()]
>>> d
>>> d = ['I', 'am', 'a', 'great', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', 'a', 'joke', '?', '?']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复