Flattening Generic JSON List of Dicts or Lists in Python

前端 未结 1 1510
孤街浪徒
孤街浪徒 2021-01-06 18:41

I have a set of arbitrary JSON data that has been parsed in Python to lists of dicts and lists of varying depth. I need to be able to \'flatten\' this into a list of dicts.

1条回答
  •  天涯浪人
    2021-01-06 19:04

    Alright. My solution comes with two functions. The first, splitObj, takes care of splitting an object into the flat data and the sublist or subobject which will later require the recursion. The second, flatten, actually iterates of a list of objects, makes the recursive calls and takes care of reconstructing the final object for each iteration.

    def splitObj (obj, prefix = None):
        '''
        Split the object, returning a 3-tuple with the flat object, optionally
        followed by the key for the subobjects and a list of those subobjects.
        '''
        # copy the object, optionally add the prefix before each key
        new = obj.copy() if prefix is None else { '{}_{}'.format(prefix, k): v for k, v in obj.items() }
    
        # try to find the key holding the subobject or a list of subobjects
        for k, v in new.items():
            # list of subobjects
            if isinstance(v, list):
                del new[k]
                return new, k, v
            # or just one subobject
            elif isinstance(v, dict):
                del new[k]
                return new, k, [v]
        return new, None, None
    
    def flatten (data, prefix = None):
        '''
        Flatten the data, optionally with each key prefixed.
        '''
        # iterate all items
        for item in data:
            # split the object
            flat, key, subs = splitObj(item, prefix)
    
            # just return fully flat objects
            if key is None:
                yield flat
                continue
    
            # otherwise recursively flatten the subobjects
            for sub in flatten(subs, key):
                sub.update(flat)
                yield sub
    

    Note that this does not exactly produce your desired output. The reason for this is that your output is actually inconsistent. In the second example, for the case where there are companies nested in the industries, the nesting isn’t visible in the output. So instead, my output will generate industry_company_id and industry_company_symbol:

    >>> ex1 = [{u'industry': [{u'id': u'112', u'name': u'A'},
                              {u'id': u'132', u'name': u'B'},
                              {u'id': u'110', u'name': u'C'}],
                u'name': u'materials'},
               {u'industry': {u'id': u'210', u'name': u'A'}, u'name': u'conglomerates'}]
    >>> ex2 = [{u'industry': [{u'id': u'112', u'name': u'A'},
                              {u'id': u'132', u'name': u'B'},
                              {u'company': [{u'id': '500', u'symbol': 'X'},
                                            {u'id': '502', u'symbol': 'Y'},
                                            {u'id': '504', u'symbol': 'Z'}],
                               u'id': u'110',
                               u'name': u'C'}],
                u'name': u'materials'},
               {u'industry': {u'id': u'210', u'name': u'A'}, u'name': u'conglomerates'}]
    
    >>> pprint(list(flatten(ex1)))
    [{'industry_id': u'112', 'industry_name': u'A', u'name': u'materials'},
     {'industry_id': u'132', 'industry_name': u'B', u'name': u'materials'},
     {'industry_id': u'110', 'industry_name': u'C', u'name': u'materials'},
     {'industry_id': u'210', 'industry_name': u'A', u'name': u'conglomerates'}]
    >>> pprint(list(flatten(ex2)))
    [{'industry_id': u'112', 'industry_name': u'A', u'name': u'materials'},
     {'industry_id': u'132', 'industry_name': u'B', u'name': u'materials'},
     {'industry_company_id': '500',
      'industry_company_symbol': 'X',
      'industry_id': u'110',
      'industry_name': u'C',
      u'name': u'materials'},
     {'industry_company_id': '502',
      'industry_company_symbol': 'Y',
      'industry_id': u'110',
      'industry_name': u'C',
      u'name': u'materials'},
     {'industry_company_id': '504',
      'industry_company_symbol': 'Z',
      'industry_id': u'110',
      'industry_name': u'C',
      u'name': u'materials'},
     {'industry_id': u'210', 'industry_name': u'A', u'name': u'conglomerates'}]
    

    0 讨论(0)
提交回复
热议问题