load large JCON file in Python - Error = JSONDecodeError: Extra data

拟墨画扇 提交于 2019-12-11 04:14:27

问题


I am trying to load in python the file business.json from yelp academic data available for their academic challenge, see below (https://www.yelp.com/dataset/documentation/json) My Goal is to extract all restaurant and their ID to then find the one restaurant I am interested for. Once I have this restaurant id, I want to load review.json and extract all reviews for that given restaurant. Sadly I am stuck at the initial stage of landing the .json

this is what business.json looks like:

{
    // string, 22 character unique string business id
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

    // string, the business's name
    "name": "Garaje",

    // string, the neighborhood's name
    "neighborhood": "SoMa",

    // string, the full address of the business
    "address": "475 3rd St",

    // string, the city
    "city": "San Francisco",

    // string, 2 character state code, if applicable
    "state": "CA",

    // string, the postal code
    "postal code": "94107",

    // float, latitude
    "latitude": 37.7817529521,

    // float, longitude
    "longitude": -122.39612197,

    // float, star rating, rounded to half-stars
    "stars": 4.5,

    // interger, number of reviews
    "review_count": 1198,

    // integer, 0 or 1 for closed or open, respectively
    "is_open": 1,

    // object, business attributes to values. note: some attribute values might be objects
    "attributes": {
        "RestaurantsTakeOut": true,
        "BusinessParking": {
            "garage": false,
            "street": true,
            "validated": false,
            "lot": false,
            "valet": false
        },
    },

    // an array of strings of business categories
    "categories": [
        "Mexican",
        "Burgers",
        "Gastropubs"
    ],

    // an object of key day to value hours, hours are using a 24hr clock
    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}

When I try to import business.json with the following code:

import json

jsonBus = json.loads(open('business.json').read())
for item in jsonBus:
    name = item.get("Name")
    businessID = item.get("business_id")

I get the following error:

runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
Traceback (most recent call last):

  File "<ipython-input-46-68ba9d6458bc>", line 1, in <module>
    runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module>
    jsonBus = json.loads(open('business.json').read())

  File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)

  File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)

JSONDecodeError: Extra data

Does anyone know why such errors appears?

I am also open to any smarter way to proceed!

Best,

Nico


回答1:


If your json file is exactly the same as you mentioned, it should not have comments (a.k.a. // string, 22 character unique string business id) as they are not a part of the standard.

Please see a related post here: Can comments be used in JSON?




回答2:


I think this works - I'm working with the same dataset and had similar errors. Saw a comment here that seems to work.

import json

js = [json.loads(line) for line in open('business.json')]
for item in js:
    name = item.get("name")
    businessID = item.get("business_id")

However, I'm still wondering why json.loads() doesn't work. The file itself looks fine.



来源:https://stackoverflow.com/questions/46626952/load-large-jcon-file-in-python-error-jsondecodeerror-extra-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!