Flattening Generic JSON List of Dicts or Lists in Python

只愿长相守 提交于 2019-11-30 23:23:22

Alright. My solution comes with two functions. The first, splitObj, takes care of splitting an object into the flat data and the sublist or subobject which will later require the recursion. The second, flatten, actually iterates of a list of objects, makes the recursive calls and takes care of reconstructing the final object for each iteration.

def splitObj (obj, prefix = None):
    '''
    Split the object, returning a 3-tuple with the flat object, optionally
    followed by the key for the subobjects and a list of those subobjects.
    '''
    # copy the object, optionally add the prefix before each key
    new = obj.copy() if prefix is None else { '{}_{}'.format(prefix, k): v for k, v in obj.items() }

    # try to find the key holding the subobject or a list of subobjects
    for k, v in new.items():
        # list of subobjects
        if isinstance(v, list):
            del new[k]
            return new, k, v
        # or just one subobject
        elif isinstance(v, dict):
            del new[k]
            return new, k, [v]
    return new, None, None

def flatten (data, prefix = None):
    '''
    Flatten the data, optionally with each key prefixed.
    '''
    # iterate all items
    for item in data:
        # split the object
        flat, key, subs = splitObj(item, prefix)

        # just return fully flat objects
        if key is None:
            yield flat
            continue

        # otherwise recursively flatten the subobjects
        for sub in flatten(subs, key):
            sub.update(flat)
            yield sub

Note that this does not exactly produce your desired output. The reason for this is that your output is actually inconsistent. In the second example, for the case where there are companies nested in the industries, the nesting isn’t visible in the output. So instead, my output will generate industry_company_id and industry_company_symbol:

>>> ex1 = [{u'industry': [{u'id': u'112', u'name': u'A'},
                          {u'id': u'132', u'name': u'B'},
                          {u'id': u'110', u'name': u'C'}],
            u'name': u'materials'},
           {u'industry': {u'id': u'210', u'name': u'A'}, u'name': u'conglomerates'}]
>>> ex2 = [{u'industry': [{u'id': u'112', u'name': u'A'},
                          {u'id': u'132', u'name': u'B'},
                          {u'company': [{u'id': '500', u'symbol': 'X'},
                                        {u'id': '502', u'symbol': 'Y'},
                                        {u'id': '504', u'symbol': 'Z'}],
                           u'id': u'110',
                           u'name': u'C'}],
            u'name': u'materials'},
           {u'industry': {u'id': u'210', u'name': u'A'}, u'name': u'conglomerates'}]

>>> pprint(list(flatten(ex1)))
[{'industry_id': u'112', 'industry_name': u'A', u'name': u'materials'},
 {'industry_id': u'132', 'industry_name': u'B', u'name': u'materials'},
 {'industry_id': u'110', 'industry_name': u'C', u'name': u'materials'},
 {'industry_id': u'210', 'industry_name': u'A', u'name': u'conglomerates'}]
>>> pprint(list(flatten(ex2)))
[{'industry_id': u'112', 'industry_name': u'A', u'name': u'materials'},
 {'industry_id': u'132', 'industry_name': u'B', u'name': u'materials'},
 {'industry_company_id': '500',
  'industry_company_symbol': 'X',
  'industry_id': u'110',
  'industry_name': u'C',
  u'name': u'materials'},
 {'industry_company_id': '502',
  'industry_company_symbol': 'Y',
  'industry_id': u'110',
  'industry_name': u'C',
  u'name': u'materials'},
 {'industry_company_id': '504',
  'industry_company_symbol': 'Z',
  'industry_id': u'110',
  'industry_name': u'C',
  u'name': u'materials'},
 {'industry_id': u'210', 'industry_name': u'A', u'name': u'conglomerates'}]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!