I have a set of urls containing json files and an empty pandas dataframe with columns representing the attributes of the jsnon files. Not all json files have all the attributes
Assuming that df
is empty and has the same columns as the url dictionary keys, i.e.
list(df)
#[u'alternate_product_code',
# u'availability',
# u'boz',
# ...
len(df)
#0
then you can use pandas.append
for url in links:
url_data = urllib2.urlopen(str(url)).read()
url_dict = json.loads(url_data)
a_dict = { k:pandas.Series([str(v)], index=[0]) for k,v in url_dict.iteritems() }
new_df = pandas.DataFrame.from_dict(a_dict)
df.append(new_df, ignore_index=True)
Not too sure why your code won't work, but consider the following few edits which should clean things up, should you still want to use it:
for row,url in enumerate(links):
data = urllib2.urlopen(str(url)).read()
data_dict = json.loads(data)
for key,val in data_dict.items():
if key in list(df):
df.ix[row,key] = val
I used enumerate to iterate over the index and value of links array, in this way you dont need an index counter (row
in your code) and then I used the .items
dictionary method, so I can iterate over key and values at once. I believe pandas will automatically handle the empty dataframe entries.