问题
I want to import OData XML datafeeds from the Dutch Bureau of Statistics (CBS) into our database. Using lxml and pandas I thought this should be straigtforward. By using OrderDict I want to preserve the order of the columns for readability, but somehow I can't get it right.
from collections import OrderedDict
from lxml import etree
import requests
import pandas as pd
# CBS URLs
base_url = 'http://opendata.cbs.nl/ODataFeed/odata'
datasets = ['/37296ned', '/82245NED']
feed = requests.get(base_url + datasets[1] + '/TypedDataSet')
root = etree.fromstring(feed.content)
# all record entries start at tag m:properties, parse into data dict
data = []
for record in root.iter('{{{}}}properties'.format(root.nsmap['m'])):
row = OrderedDict()
for element in record:
row[element.tag.split('}')[1]] = element.text
data.append(row)
df = pd.DataFrame.from_dict(data)
df.columns
Inspecting data
, the OrderDict is in the right order. But looking at df.head()
the columns have been sorted alphabetically with CAPS first?
Help, anyone?
回答1:
Something in your example seems to be inconsistent, as data
is a list
and no dict
, but assuming you really have an OrderedDict
:
Try to explicitly specify your column order when you create your DataFrame:
# ... all your data collection
df = pd.DataFrame(data, columns=data.keys())
This should give you your DataFrame with the columns ordered just in exact the way they are in the OrderedDict (via the data.keys()
generated list)
回答2:
The above answer doesn't work for me and keep giving me "ValueError: cannot use columns parameter with orient='columns'".
Later I found a solution by doing this below and worked:
df = pd.DataFrame.from_dict (dict_data) [list (dict_data[0].keys())]
来源:https://stackoverflow.com/questions/33752819/pandas-dataframe-from-dict-not-preserving-order-using-ordereddict