Python: How RECURSIVELY remove None values from a NESTED data structure (lists and dictionaries)?

后端未结

关注

 5  1619

Here is some nested data, that includes lists, tuples, and dictionaries:

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, \


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-12-28 19:32
              
            
            
                                                                       
If you want a full-featured, yet succinct approach to handling real-world nested data structures like these, and even handle cycles, I recommend looking at the remap utility from the boltons utility package. 

After pip install boltons or copying iterutils.py into your project, just do:



from collections import OrderedDict
from boltons.iterutils import remap

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]

drop_none = lambda path, key, value: key is not None and value is not None

cleaned = remap(data, visit=drop_none)

print(cleaned)

# got:
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]


This page has many more examples, including ones working with much larger objects (from Github's API).

It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-28 19:42
              
            
            
                                                                       
def purify(o):
    if hasattr(o, 'items'):
        oo = type(o)()
        for k in o:
            if k != None and o[k] != None:
                oo[k] = purify(o[k])
    elif hasattr(o, '__iter__'):
        oo = [ ] 
        for it in o:
            if it != None:
                oo.append(purify(it))
    else: return o
    return type(o)(oo)

print purify(data)


Gives:

[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-12-28 19:47
              
            
            
                                                                       
If you can assume that the __init__ methods of the various subclasses have the same signature as the typical base class:

def remove_none(obj):
  if isinstance(obj, (list, tuple, set)):
    return type(obj)(remove_none(x) for x in obj if x is not None)
  elif isinstance(obj, dict):
    return type(obj)((remove_none(k), remove_none(v))
      for k, v in obj.items() if k is not None and v is not None)
  else:
    return obj

from collections import OrderedDict
data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
print remove_none(data)


Note that this won't work with a defaultdict for example since the defaultdict takes and additional argument to __init__.  To make it work with defaultdict would require another special case elif (before the one for regular dicts).



Also note that I've actually constructed new objects.  I haven't modified the old ones.  It would be possible to modify the old objects if you didn't need to support modifying immutable objects like tuple.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-12-28 19:53
              
            
            
                                                                       
def stripNone(data):
    if isinstance(data, dict):
        return {k:stripNone(v) for k, v in data.items() if k is not None and v is not None}
    elif isinstance(data, list):
        return [stripNone(item) for item in data if item is not None]
    elif isinstance(data, tuple):
        return tuple(stripNone(item) for item in data if item is not None)
    elif isinstance(data, set):
        return {stripNone(item) for item in data if item is not None}
    else:
        return data


Sample Runs:

print stripNone(data1)
print stripNone(data2)
print stripNone(data3)
print stripNone(data)

(501, (999,), 504)
{'four': 'sixty', 1: 601}
{12: 402, 14: {'four': 'sixty', 1: 601}}
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, {12: 402, 14: {'four': 'sixty', 1: 601}})]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-12-28 19:55
              
            
            
                                                                       
This is my original attempt, before posting the question.
Keeping it here, as it may help explain the goal.

It also has some code that would be useful if one wants to MODIFY an existing LARGE collection, rather than duplicating the data into a NEW collection.
(The other answers create new collections.)

# ---------- StripNones.py Python 2.7 ----------

import collections, copy

# Recursively remove None, from list/tuple elements, and dict key/values.
# NOTE: Changes type of iterable to list, except for strings and tuples.
# NOTE: We don't RECURSE KEYS.
# When "beImmutable=False", may modify "data".
# Result may have different collection types; similar to "filter()".
def StripNones(data, beImmutable=True):
    t = type(data)
    if issubclass(t, dict):
        return _StripNones_FromDict(data, beImmutable)

    elif issubclass(t, collections.Iterable):
        if issubclass(t, basestring):
            # Don't need to search a string for None.
            return data

        # NOTE: Changes type of iterable to list.
        data = [StripNones(x, beImmutable) for x in data if x is not None]
        if issubclass(t, tuple):
            return tuple(data)

    return data

# Modifies dict, removing items whose keys are in keysToRemove.
def RemoveKeys(dict, keysToRemove):
    for key in keysToRemove:
        dict.pop(key, None) 

# Recursively remove None, from dict key/values.
# NOTE: We DON'T RECURSE KEYS.
# When "beImmutable=False", may modify "data".
def _StripNones_FromDict(data, beImmutable):
    keysToRemove = []
    newItems = []
    for item in data.iteritems():
        key = item[0]
        if None in item:
            # Either key or value is None.
            keysToRemove.append( key )
        else:
            # The value might change when stripped.
            oldValue = item[1]
            newValue = StripNones(oldValue, beImmutable)
            if newValue is not oldValue:
                newItems.append( (key, newValue) )

    somethingChanged = (len(keysToRemove) > 0) or (len(newItems) > 0)
    if beImmutable and somethingChanged:
        # Avoid modifying the original.
        data = copy.copy(data)

    if len(keysToRemove) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        RemoveKeys(data, keysToRemove)

    if len(newItems) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        data.update( newItems )

    return data



# ---------- TESTING ----------
# When run this file as a script (instead of importing it):
if (__name__ == "__main__"):
    from collections import OrderedDict

    maxWidth = 100
    indentStr = '. '

    def NewLineAndIndent(indent):
        return '\n' + indentStr*indent
    #print NewLineAndIndent(3)

    # Returns list of strings.
    def HeaderAndItems(value, indent=0):
        if isinstance(value, basestring):
            L = repr(value)
        else:
            if isinstance(value, dict):
                L = [ repr(key) + ': ' + Repr(value[key], indent+1) for key in value ]
            else:
                L = [ Repr(x, indent+1) for x in value ]
            header = type(value).__name__ + ':'
            L.insert(0, header)
        #print L
        return L

    def Repr(value, indent=0):
        result = repr(value)
        if (len(result) > maxWidth) and \
          isinstance(value, collections.Iterable) and \
          not isinstance(value, basestring):
            L = HeaderAndItems(value, indent)
            return NewLineAndIndent(indent + 1).join(L)

        return result

    #print Repr( [11, [221, 222], {'331':331, '332': {'3331':3331} }, 44] )

    def printV(name, value):
        print( str(name) + "= " + Repr(value) )

    print '\n\n\n'
    data1 = ( 501, (None, 999), None, (None), 504 )
    data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
    data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
    data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
    printV( 'ORIGINAL data', data )
    printV( 'StripNones(data)', StripNones(data) )
    print '----- beImmutable = True -----'
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print '----- beImmutable = False -----'
    StripNones(data, False)
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print


Output:

ORIGINAL data= list:
. [None, 22, (None,), (None, None), None]
. tuple:
. . (None, 202)
. . {32: 302, 33: (501, (None, 999), None, None, 504), None: 301}
. . OrderedDict:
. . . None: 401
. . . 12: 402
. . . 13: None
. . . 14: {'four': 'sixty', 1: 601, 2: None, None: 603}
StripNones(data)= list:
. [22, (), ()]
. tuple:
. . (202,)
. . {32: 302, 33: (501, (999,), 504)}
. . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])
----- beImmutable = True -----
data2= {'four': 'sixty', 1: 601, 2: None, None: 603}
----- beImmutable = False -----
data2= {'four': 'sixty', 1: 601}


Key points:


if issubclass(t, basestring): avoids searching inside of strings, as that makes no sense, AFAIK.
if issubclass(t, tuple): converts the result back to a tuple.
For dictionaries, copy.copy(data) is used, to return an object of the same type as the original dictionary.
LIMITATION: Does not attempt to preserve collection/iterator type for types other than: list, tuple, dict (& its subclasses). 
Default usage copies data structures, if a change is needed. Passing in False for beImmutable can result in higher performance when a LOT of data, but will alter the original data, including altering nested pieces of the data -- which might be referenced by variables elsewhere in your code.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复