Elegant way to remove fields from nested dictionaries

后端 未结 9 1781
傲寒
傲寒 2020-12-05 03:04

I had to remove some fields from a dictionary, the keys for those fields are on a list. So I wrote this function:

def delete_keys_from_dict(dict_del, lst_key         


        
相关标签:
9条回答
  • 2020-12-05 03:16

    I came here to search for a solution to remove keys from deeply nested Python3 dicts and all solutions seem to be somewhat complex.

    Here's a oneliner for removing keys from nested or flat dicts:

    nested_dict = {
        "foo": {
            "bar": {
                "foobar": {},
                "shmoobar": {}
            }
        }
    }
    
    >>> {'foo': {'bar': {'foobar': {}, 'shmoobar': {}}}}
    
    nested_dict.get("foo", {}).get("bar", {}).pop("shmoobar", None)
    
    >>> {'foo': {'bar': {'foobar': {}}}}
    

    I used .get() to not get KeyError and I also provide empty dict as default value up to the end of the chain. I do pop() for the last element and I provide None as the default there to avoid KeyError.

    0 讨论(0)
  • 2020-12-05 03:20
    def delete_keys_from_dict(d, to_delete):
        if isinstance(to_delete, str):
            to_delete = [to_delete]
        if isinstance(d, dict):
            for single_to_delete in set(to_delete):
                if single_to_delete in d:
                    del d[single_to_delete]
            for k, v in d.items():
                delete_keys_from_dict(v, to_delete)
        elif isinstance(d, list):
            for i in d:
                delete_keys_from_dict(i, to_delete)
        return d
    
    d = {'a': 10, 'b': [{'c': 10, 'd': 10, 'a': 10}, {'a': 10}], 'c': 1 }
    delete_keys_from_dict(d, ['a', 'c']) 
    
    >>> {'b': [{'d': 10}, {}]}
    

    This solution works for dict and list in a given nested dict. The input to_delete can be a list of str to be deleted or a single str.

    Plese note, that if you remove the only key in a dict, you will get an empty dict.

    0 讨论(0)
  • 2020-12-05 03:21

    First of, I think your code is working and not inelegant. There's no immediate reason not to use the code you presented.

    There are a few things that could be better though:

    Comparing the type

    Your code contains the line:

    if type(dict_foo[field]) == dict:
    

    That can be definitely improved. Generally (see also PEP8) you should use isinstance instead of comparing types:

    if isinstance(dict_foo[field], dict)
    

    However that will also return True if dict_foo[field] is a subclass of dict. If you don't want that, you could also use is instead of ==. That will be marginally (and probably unnoticeable) faster.

    If you also want to allow arbitary dict-like objects you could go a step further and test if it's a collections.abc.MutableMapping. That will be True for dict and dict subclasses and for all mutable mappings that explicitly implement that interface without subclassing dict, for example UserDict:

    >>> from collections import MutableMapping
    >>> # from UserDict import UserDict # Python 2.x
    >>> from collections import UserDict  # Python 3.x - 3.6
    >>> # from collections.abc import MutableMapping # Python 3.7+
    >>> isinstance(UserDict(), MutableMapping)
    True
    >>> isinstance(UserDict(), dict)
    False
    

    Inplace modification and return value

    Typically functions either modify a data structure inplace or return a new (modified) data structure. Just to mention a few examples: list.append, dict.clear, dict.update all modify the data structure inplace and return None. That makes it easier to keep track what a function does. However that's not a hard rule and there are always valid exceptions from this rule. However personally I think a function like this doesn't need to be an exception and I would simply remove the return dict_del line and let it implicitly return None, but YMMV.

    Removing the keys from the dictionary

    You copied the dictionary to avoid problems when you remove key-value pairs during the iteration. However, as already mentioned by another answer you could just iterate over the keys that should be removed and try to delete them:

    for key in keys_to_remove:
        try:
            del dict[key]
        except KeyError:
            pass
    

    That has the additional advantage that you don't need to nest two loops (which could be slower, especially if the number of keys that need to be removed is very long).

    If you don't like empty except clauses you can also use: contextlib.suppress (requires Python 3.4+):

    from contextlib import suppress
    
    for key in keys_to_remove:
        with suppress(KeyError):
            del dict[key] 
    

    Variable names

    There are a few variables I would rename because they are just not descriptive or even misleading:

    • delete_keys_from_dict should probably mention the subdict-handling, maybe delete_keys_from_dict_recursive.

    • dict_del sounds like a deleted dict. I tend to prefer names like dictionary or dct because the function name already describes what is done to the dictionary.

    • lst_keys, same there. I'd probably use just keys there. If you want to be more specific something like keys_sequence would make more sense because it accepts any sequence (you just have to be able to iterate over it multiple times), not just lists.

    • dict_foo, just no...

    • field isn't really appropriate either, it's a key.

    Putting it all together:

    As I said before I personally would modify the dictionary in-place and not return the dictionary again. Because of that I present two solutions, one that modifies it in-place but doesn't return anything and one that creates a new dictionary with the keys removed.

    The version that modifies in-place (very much like Ned Batchelders solution):

    from collections import MutableMapping
    from contextlib import suppress
    
    def delete_keys_from_dict(dictionary, keys):
        for key in keys:
            with suppress(KeyError):
                del dictionary[key]
        for value in dictionary.values():
            if isinstance(value, MutableMapping):
                delete_keys_from_dict(value, keys)
    

    And the solution that returns a new object:

    from collections import MutableMapping
    
    def delete_keys_from_dict(dictionary, keys):
        keys_set = set(keys)  # Just an optimization for the "if key in keys" lookup.
    
        modified_dict = {}
        for key, value in dictionary.items():
            if key not in keys_set:
                if isinstance(value, MutableMapping):
                    modified_dict[key] = delete_keys_from_dict(value, keys_set)
                else:
                    modified_dict[key] = value  # or copy.deepcopy(value) if a copy is desired for non-dicts.
        return modified_dict
    

    However it only makes copies of the dictionaries, the other values are not returned as copy, you could easily wrap these in copy.deepcopy (I put a comment in the appropriate place of the code) if you want that.

    0 讨论(0)
  • 2020-12-05 03:22

    Since you already need to loop through every element in the dict, I'd stick with a single loop and just make sure to use a set for looking up the keys to delete

    def delete_keys_from_dict(dict_del, the_keys):
        """
        Delete the keys present in the lst_keys from the dictionary.
        Loops recursively over nested dictionaries.
        """
        # make sure the_keys is a set to get O(1) lookups
        if type(the_keys) is not set:
            the_keys = set(the_keys)
        for k,v in dict_del.items():
            if k in the_keys:
                del dict_del[k]
            if isinstance(v, dict):
                delete_keys_from_dict(v, the_keys)
        return dict_del
    
    0 讨论(0)
  • 2020-12-05 03:23

    Since the question requested an elegant way, I'll submit my general-purpose solution to wrangling nested structures. First, install the boltons utility package with pip install boltons, then:

    from boltons.iterutils import remap
    
    data = {'one': 'remains', 'this': 'goes', 'of': 'course'}
    bad_keys = set(['this', 'is', 'a', 'list', 'of', 'keys'])
    
    drop_keys = lambda path, key, value: key not in bad_keys
    clean = remap(data, visit=drop_keys)
    print(clean)
    
    # Output:
    {'one': 'remains'}
    

    In short, the remap utility is a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles and special containers.

    This page has many more examples, including ones working with much larger objects from Github's API.

    It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

    0 讨论(0)
  • 2020-12-05 03:23

    I think the following is more elegant:

    def delete_keys_from_dict(dict_del, lst_keys):
        if not isinstance(dict_del, dict):
            return dict_del
        return {key:value for key,value in ((key, delete_keys_from_dict(value)) for key,value in dict_del.items()) if key not in lst_keys}
    
    0 讨论(0)
提交回复
热议问题