Iteration over the dictionary and extracting values

问题

I have a dictionary (result_dict) as follows.

{'11333216@N05': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 3,
   'iconserver': '2214',
   'id': '11333216@N05',
   'ispro': 0,
   'location': {'_content': ''},
   'mbox_sha1sum': {'_content': '8eb2e248cbad94e2b4a5aae75eb653c7e061a90c'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=11327876'},
   'nsid': '11333216@N05',
   'path_alias': 'kishansamarasinghe',
   'photos': {'count': {'_content': 442},
    'firstdate': {'_content': '1193073180'},
    'firstdatetaken': {'_content': '2000-01-01 00:49:17'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/kishansamarasinghe/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/kishansamarasinghe/'},
   'realname': {'_content': 'Kishan Samarasinghe'},
   'timezone': {'label': 'Sri Jayawardenepura',
    'offset': '+06:00',
    'timezone_id': 'Asia/Colombo'},
   'username': {'_content': 'Three Sixty Five Degrees'}},
  'stat': 'ok'},
 '117692977@N08': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '0',
   'iconfarm': 1,
   'iconserver': '404',
   'id': '117692977@N08',
   'ispro': 0,
   'location': {'_content': 'Almere, The Nederlands'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=117600164'},
   'nsid': '117692977@N08',
   'path_alias': 'meijsvo',
   'photos': {'count': {'_content': 3237},
    'firstdate': {'_content': '1392469161'},
    'firstdatetaken': {'_content': '2013-06-23 14:39:30'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/meijsvo/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/meijsvo/'},
   'realname': {'_content': 'Markéta Eijsvogelová'},
   'timezone': {'label': 'Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna',
    'offset': '+01:00',
    'timezone_id': 'Europe/Amsterdam'},
   'username': {'_content': 'meijsvo'}},
  'stat': 'ok'},
 '21539776@N02': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 0,
   'iconserver': '0',

This contains more than 150 usernames (e.g. 11333216@N05) . I want to extract 'mobileurl' for each user and create a dataframe containing username and mobileurl columns. I couldn't find a way to iterate each user and extract his mobileurl as indexing is impossible. However, I have extract the mobileurl for one of the users as follows.

result_dict['76617062@N08']["person"]["mobileurl"]['_content']

'https://m.flickr.com/photostream.gne?id=76524249'

Would be grateful if someone can help, as I'm a bit new to python.

回答1:

Iterate through the dictionarys list of keys which in this case are the usernames, then use each one to access each top level dict and from there dive through all the other layers to find the exact data you need. The mobileurl in your example.

Once you have these 2 variables, add them to your dataframe.

# Iterate through list of users
for user in result_dict.keys():

    # use each username to find the mobileurl you need within
    mobileurl = result_dict[user]["person"]["mobileurl"]["_content"]

    # Add the variables 'user' and 'mobileurl' to dataframe as you see fit

回答2:

result_dict = {'11333216@N05': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 3,
   'iconserver': '2214',
   'id': '11333216@N05',
   'ispro': 0,
   'location': {'_content': ''},
   'mbox_sha1sum': {'_content': '8eb2e248cbad94e2b4a5aae75eb653c7e061a90c'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=11327876'},
   'nsid': '11333216@N05',
   'path_alias': 'kishansamarasinghe',
   'photos': {'count': {'_content': 442},
    'firstdate': {'_content': '1193073180'},
    'firstdatetaken': {'_content': '2000-01-01 00:49:17'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/kishansamarasinghe/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/kishansamarasinghe/'},
   'realname': {'_content': 'Kishan Samarasinghe'},
   'timezone': {'label': 'Sri Jayawardenepura',
    'offset': '+06:00',
    'timezone_id': 'Asia/Colombo'},
   'username': {'_content': 'Three Sixty Five Degrees'}},
  'stat': 'ok'},
 '117692977@N08': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '0',
   'iconfarm': 1,
   'iconserver': '404',
   'id': '117692977@N08',
   'ispro': 0,
   'location': {'_content': 'Almere, The Nederlands'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=117600164'},
   'nsid': '117692977@N08',
   'path_alias': 'meijsvo',
   'photos': {'count': {'_content': 3237},
    'firstdate': {'_content': '1392469161'},
    'firstdatetaken': {'_content': '2013-06-23 14:39:30'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/meijsvo/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/meijsvo/'},
   'realname': {'_content': 'Markéta Eijsvogelová'},
   'timezone': {'label': 'Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna',
    'offset': '+01:00',
    'timezone_id': 'Europe/Amsterdam'},
   'username': {'_content': 'meijsvo'}},
  'stat': 'ok'},
 '21539776@N02': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 0,
   'iconserver': '0'}
}
}

For your use case better use iteritems() of dictionary:

for key, value in result_dict.iteritems():
    print value.get("person", {}).get("mobileurl", {}).get("_content")

OUTPUT

https://m.flickr.com/photostream.gne?id=117600164
https://m.flickr.com/photostream.gne?id=11327876

回答3:

I think you could also try to do it more of a pandas way instead of pure dictionary iteration. it's not necessarily the fastest but given you are new to python and pandas, I think it's good thing to know that pandas can handle this well.

I am assuming you are using pandas DataFrame, not just dictionary. you could easily achieve the same purpose without converting your json to pandas DataFrame. i.e. other answers will work even if you are not a pandas DataFrame. they are also valid python dictionary syntax.

urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

here we have selected all rows that have the index as person and then we tried to apply a function (lambda is the anonymous function we'll be using) to each person. In this case, we are extracting out the urls using the lambda function, then pandas converted the result back to a pandas DataFrame (or Series) for you to use.

normally I would also care about how fast my iteration is.

(following are done in IPython, a nice tool you could use to do many things in python. %%timeit is a magic function provided by IPython for you to calculate the time your codes could take)

%timeit 
urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

1000 loops, best of 3: 133 us per loop (us = microsecond, 10e-6)

@SamC provided the fast solution here I can let you know. but like i said, you don't need a DataFrame to use his solution. it'll also work for plain dictionary.

来源：https://stackoverflow.com/questions/47651708/iteration-over-the-dictionary-and-extracting-values

标签

python

json

pandas

dictionary

flickr