'Expected String or Unicode' when reading JSON with Pandas

筅森魡賤 提交于 2019-12-05 08:12:10

If you print the json string to a file,

content = osm.read()
with open('/tmp/out', 'w') as f:
    f.write(content)

you'll see something like this:

{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2014-07-20T07:52:02Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
},
...]}

If the JSON string were to be converted to a Python object, it would be a dict whose elements key is a list of dicts. The vast majority of the data is inside this list of dicts.

This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator'] to be the columns, since almost all the information is in the elements list-of-dicts.

But if you want the DataFrame to consist of the data only in the elements list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.

Further complicating things is that each dict in elements is a nested dict. Consider the first dict in elements:

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
}

Should ['lat', 'lon', 'type', 'id', 'tags'] be the columns? That seems plausible, except that the tags column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.


import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
    for key, val in dct['tags'].iteritems():
        dct[key] = val
    del dct['tags']

osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())

yields

         lat        lon                        name
0  50.984926  13.682178  Niederhäslich Bergmannsweg
1  51.123623  13.782789                Sagarder Weg
2  51.065752  13.895734     Weißig, Einkaufszentrum
3  51.007140  13.698498          Stuttgarter Straße
4  51.010199  13.701411          Heilbronner Straße
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!