问题
I am trying to load in python the file business.json from yelp academic data available for their academic challenge, see below (https://www.yelp.com/dataset/documentation/json) My Goal is to extract all restaurant and their ID to then find the one restaurant I am interested for. Once I have this restaurant id, I want to load review.json and extract all reviews for that given restaurant. Sadly I am stuck at the initial stage of landing the .json
this is what business.json looks like:
{
// string, 22 character unique string business id
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
// string, the business's name
"name": "Garaje",
// string, the neighborhood's name
"neighborhood": "SoMa",
// string, the full address of the business
"address": "475 3rd St",
// string, the city
"city": "San Francisco",
// string, 2 character state code, if applicable
"state": "CA",
// string, the postal code
"postal code": "94107",
// float, latitude
"latitude": 37.7817529521,
// float, longitude
"longitude": -122.39612197,
// float, star rating, rounded to half-stars
"stars": 4.5,
// interger, number of reviews
"review_count": 1198,
// integer, 0 or 1 for closed or open, respectively
"is_open": 1,
// object, business attributes to values. note: some attribute values might be objects
"attributes": {
"RestaurantsTakeOut": true,
"BusinessParking": {
"garage": false,
"street": true,
"validated": false,
"lot": false,
"valet": false
},
},
// an array of strings of business categories
"categories": [
"Mexican",
"Burgers",
"Gastropubs"
],
// an object of key day to value hours, hours are using a 24hr clock
"hours": {
"Monday": "10:00-21:00",
"Tuesday": "10:00-21:00",
"Friday": "10:00-21:00",
"Wednesday": "10:00-21:00",
"Thursday": "10:00-21:00",
"Sunday": "11:00-18:00",
"Saturday": "10:00-21:00"
}
}
When I try to import business.json with the following code:
import json
jsonBus = json.loads(open('business.json').read())
for item in jsonBus:
name = item.get("Name")
businessID = item.get("business_id")
I get the following error:
runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
Traceback (most recent call last):
File "<ipython-input-46-68ba9d6458bc>", line 1, in <module>
runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module>
jsonBus = json.loads(open('business.json').read())
File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
JSONDecodeError: Extra data
Does anyone know why such errors appears?
I am also open to any smarter way to proceed!
Best,
Nico
回答1:
If your json file is exactly the same as you mentioned, it should not have comments (a.k.a. // string, 22 character unique string business id
) as they are not a part of the standard.
Please see a related post here: Can comments be used in JSON?
回答2:
I think this works - I'm working with the same dataset and had similar errors. Saw a comment here that seems to work.
import json
js = [json.loads(line) for line in open('business.json')]
for item in js:
name = item.get("name")
businessID = item.get("business_id")
However, I'm still wondering why json.loads()
doesn't work. The file itself looks fine.
来源:https://stackoverflow.com/questions/46626952/load-large-jcon-file-in-python-error-jsondecodeerror-extra-data