Python: How RECURSIVELY remove None values from a NESTED data structure (lists and dictionaries)?

后端 未结 5 1619
谎友^
谎友^ 2020-12-28 19:09

Here is some nested data, that includes lists, tuples, and dictionaries:

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, \         


        
相关标签:
5条回答
  • 2020-12-28 19:32

    If you want a full-featured, yet succinct approach to handling real-world nested data structures like these, and even handle cycles, I recommend looking at the remap utility from the boltons utility package.

    After pip install boltons or copying iterutils.py into your project, just do:

    from collections import OrderedDict
    from boltons.iterutils import remap
    
    data1 = ( 501, (None, 999), None, (None), 504 )
    data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
    data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
    data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
    
    drop_none = lambda path, key, value: key is not None and value is not None
    
    cleaned = remap(data, visit=drop_none)
    
    print(cleaned)
    
    # got:
    [[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]
    

    This page has many more examples, including ones working with much larger objects (from Github's API).

    It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

    0 讨论(0)
  • 2020-12-28 19:42
    def purify(o):
        if hasattr(o, 'items'):
            oo = type(o)()
            for k in o:
                if k != None and o[k] != None:
                    oo[k] = purify(o[k])
        elif hasattr(o, '__iter__'):
            oo = [ ] 
            for it in o:
                if it != None:
                    oo.append(purify(it))
        else: return o
        return type(o)(oo)
    
    print purify(data)
    

    Gives:

    [[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]
    
    0 讨论(0)
  • 2020-12-28 19:47

    If you can assume that the __init__ methods of the various subclasses have the same signature as the typical base class:

    def remove_none(obj):
      if isinstance(obj, (list, tuple, set)):
        return type(obj)(remove_none(x) for x in obj if x is not None)
      elif isinstance(obj, dict):
        return type(obj)((remove_none(k), remove_none(v))
          for k, v in obj.items() if k is not None and v is not None)
      else:
        return obj
    
    from collections import OrderedDict
    data1 = ( 501, (None, 999), None, (None), 504 )
    data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
    data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
    data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
    print remove_none(data)
    

    Note that this won't work with a defaultdict for example since the defaultdict takes and additional argument to __init__. To make it work with defaultdict would require another special case elif (before the one for regular dicts).


    Also note that I've actually constructed new objects. I haven't modified the old ones. It would be possible to modify the old objects if you didn't need to support modifying immutable objects like tuple.

    0 讨论(0)
  • 2020-12-28 19:53
    def stripNone(data):
        if isinstance(data, dict):
            return {k:stripNone(v) for k, v in data.items() if k is not None and v is not None}
        elif isinstance(data, list):
            return [stripNone(item) for item in data if item is not None]
        elif isinstance(data, tuple):
            return tuple(stripNone(item) for item in data if item is not None)
        elif isinstance(data, set):
            return {stripNone(item) for item in data if item is not None}
        else:
            return data
    

    Sample Runs:

    print stripNone(data1)
    print stripNone(data2)
    print stripNone(data3)
    print stripNone(data)
    
    (501, (999,), 504)
    {'four': 'sixty', 1: 601}
    {12: 402, 14: {'four': 'sixty', 1: 601}}
    [[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, {12: 402, 14: {'four': 'sixty', 1: 601}})]
    
    0 讨论(0)
  • 2020-12-28 19:55

    This is my original attempt, before posting the question. Keeping it here, as it may help explain the goal.

    It also has some code that would be useful if one wants to MODIFY an existing LARGE collection, rather than duplicating the data into a NEW collection. (The other answers create new collections.)

    # ---------- StripNones.py Python 2.7 ----------
    
    import collections, copy
    
    # Recursively remove None, from list/tuple elements, and dict key/values.
    # NOTE: Changes type of iterable to list, except for strings and tuples.
    # NOTE: We don't RECURSE KEYS.
    # When "beImmutable=False", may modify "data".
    # Result may have different collection types; similar to "filter()".
    def StripNones(data, beImmutable=True):
        t = type(data)
        if issubclass(t, dict):
            return _StripNones_FromDict(data, beImmutable)
    
        elif issubclass(t, collections.Iterable):
            if issubclass(t, basestring):
                # Don't need to search a string for None.
                return data
    
            # NOTE: Changes type of iterable to list.
            data = [StripNones(x, beImmutable) for x in data if x is not None]
            if issubclass(t, tuple):
                return tuple(data)
    
        return data
    
    # Modifies dict, removing items whose keys are in keysToRemove.
    def RemoveKeys(dict, keysToRemove):
        for key in keysToRemove:
            dict.pop(key, None) 
    
    # Recursively remove None, from dict key/values.
    # NOTE: We DON'T RECURSE KEYS.
    # When "beImmutable=False", may modify "data".
    def _StripNones_FromDict(data, beImmutable):
        keysToRemove = []
        newItems = []
        for item in data.iteritems():
            key = item[0]
            if None in item:
                # Either key or value is None.
                keysToRemove.append( key )
            else:
                # The value might change when stripped.
                oldValue = item[1]
                newValue = StripNones(oldValue, beImmutable)
                if newValue is not oldValue:
                    newItems.append( (key, newValue) )
    
        somethingChanged = (len(keysToRemove) > 0) or (len(newItems) > 0)
        if beImmutable and somethingChanged:
            # Avoid modifying the original.
            data = copy.copy(data)
    
        if len(keysToRemove) > 0:
            # if not beImmutable, MODIFYING ORIGINAL "data".
            RemoveKeys(data, keysToRemove)
    
        if len(newItems) > 0:
            # if not beImmutable, MODIFYING ORIGINAL "data".
            data.update( newItems )
    
        return data
    
    
    
    # ---------- TESTING ----------
    # When run this file as a script (instead of importing it):
    if (__name__ == "__main__"):
        from collections import OrderedDict
    
        maxWidth = 100
        indentStr = '. '
    
        def NewLineAndIndent(indent):
            return '\n' + indentStr*indent
        #print NewLineAndIndent(3)
    
        # Returns list of strings.
        def HeaderAndItems(value, indent=0):
            if isinstance(value, basestring):
                L = repr(value)
            else:
                if isinstance(value, dict):
                    L = [ repr(key) + ': ' + Repr(value[key], indent+1) for key in value ]
                else:
                    L = [ Repr(x, indent+1) for x in value ]
                header = type(value).__name__ + ':'
                L.insert(0, header)
            #print L
            return L
    
        def Repr(value, indent=0):
            result = repr(value)
            if (len(result) > maxWidth) and \
              isinstance(value, collections.Iterable) and \
              not isinstance(value, basestring):
                L = HeaderAndItems(value, indent)
                return NewLineAndIndent(indent + 1).join(L)
    
            return result
    
        #print Repr( [11, [221, 222], {'331':331, '332': {'3331':3331} }, 44] )
    
        def printV(name, value):
            print( str(name) + "= " + Repr(value) )
    
        print '\n\n\n'
        data1 = ( 501, (None, 999), None, (None), 504 )
        data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
        data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
        data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
        printV( 'ORIGINAL data', data )
        printV( 'StripNones(data)', StripNones(data) )
        print '----- beImmutable = True -----'
        #printV( 'data', data )
        printV( 'data2', data2 )
        #printV( 'data3', data3 )
        print '----- beImmutable = False -----'
        StripNones(data, False)
        #printV( 'data', data )
        printV( 'data2', data2 )
        #printV( 'data3', data3 )
        print
    

    Output:

    ORIGINAL data= list:
    . [None, 22, (None,), (None, None), None]
    . tuple:
    . . (None, 202)
    . . {32: 302, 33: (501, (None, 999), None, None, 504), None: 301}
    . . OrderedDict:
    . . . None: 401
    . . . 12: 402
    . . . 13: None
    . . . 14: {'four': 'sixty', 1: 601, 2: None, None: 603}
    StripNones(data)= list:
    . [22, (), ()]
    . tuple:
    . . (202,)
    . . {32: 302, 33: (501, (999,), 504)}
    . . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])
    ----- beImmutable = True -----
    data2= {'four': 'sixty', 1: 601, 2: None, None: 603}
    ----- beImmutable = False -----
    data2= {'four': 'sixty', 1: 601}
    

    Key points:

    • if issubclass(t, basestring): avoids searching inside of strings, as that makes no sense, AFAIK.

    • if issubclass(t, tuple): converts the result back to a tuple.

    • For dictionaries, copy.copy(data) is used, to return an object of the same type as the original dictionary.

    • LIMITATION: Does not attempt to preserve collection/iterator type for types other than: list, tuple, dict (& its subclasses).

    • Default usage copies data structures, if a change is needed. Passing in False for beImmutable can result in higher performance when a LOT of data, but will alter the original data, including altering nested pieces of the data -- which might be referenced by variables elsewhere in your code.

    0 讨论(0)
提交回复
热议问题