问题
I'm trying to read individual values from a JSON feed. Here is an example of the feed data:
{
"sendtoken": "token1",
"bytes_transferred": 0,
"num_retries": 0,
"timestamp": 1414395374,
"queue_time": 975,
"message": "internalerror",
"id": "mailerX",
"m0": {
"binding_group": "domain.com",
"recipient_domain": "hotmail.com",
"recipient_local": "destination",
"sender_domain": "domain.com",
"binding": "mail.domain.com",
"message_id": "C1/34-54876-D36FA645",
"api_credential": "creds",
"sender_local": "localstring"
},
"rejecting_ip": "145.5.5.5",
"type": "alpha",
"message_stage": 3
}
{
"sendtoken": "token2",
"bytes_transferred": 0,
"num_retries": 0,
"timestamp": 1414397568,
"queue_time": 538,
"message": "internal error,
"id": "mailerX",
"m0": {
"binding_group": "domain.com",
"recipient_domain": "hotmail.com",
"recipient_local": "destination",
"sender_domain": "domain.com",
"binding": "mail.domain.com",
"message_id": "C1/34-54876-D36FA645",
"api_credential": "creds",
"sender_local": "localstring"
},
"rejecting_ip": "145.5.5.5",
"type": "alpha",
"message_stage": 3
}
I can't share the actual URL, but the above are the first 2 of roughly 150 results that are displayed if I run
print results
before the
json.loads()
line.
My code:
import urllib2
import json
results = urllib2.urlopen(url).read()
jsondata = json.loads(results)
for row in jsondata:
print row['sendtoken']
print row['recipient_domain']
I'd like output like
token1
hotmail.com
for each entry.
I'm getting this error:
ValueError: Extra data: line 2 column 1 - line 133 column 1 (char 583 - 77680)
I'm far from a Python expert, and this is my first time working with JSON. I've spent quite a bit of time looking on google and Stack Overflow, but I can't find a solution that works for my specific data format.
回答1:
The problem is that your data don't form a JSON object, so you can't decode them with json.loads
.
First, this appears to be a sequence of JSON objects separated by spaces. Since you won't tell us anything about where the data come from, this is really just an educated guess; hopefully whatever documentation or coworker or whatever told you about this URL told you what the format actually is. But let's assume that my educated guess is correct.
The easiest way to parse a stream of JSON objects in Python is to use the raw_decode
method. Something like this:*
import json
def parse_json_stream(stream):
decoder = json.JSONDecoder()
while stream:
obj, idx = decoder.raw_decode(stream)
yield obj
stream = stream[idx:].lstrip()
However, there's also an error in the second JSON object in the stream. Look at this part:
…
"message": "internal error,
"id": "mailerX",
…
There's a missing "
after "internal error
. If you fix that, then the function above will iterate two JSON objects.
Hopefully that error was caused by you trying to manually "copy and paste" data by rewriting it. If it's in your original source data, you've got a much bigger problem; you probably need to write a "broken JSON" parser from scratch that can heuristically guess at what the data were intended to be. Or, of course, get whoever's generating the source to generate it properly.
* In general, it's more efficient to use the second argument to raw_decode
to pass a start index, instead of slicing off a copy of the remainder each time. But raw_decode
can't handle leading whitespace. It's a little easier to just slice and strip than to write code that skips over whitespace from the given index, but if the memory and performance costs of those copies matter, you should write the more complicated code.
回答2:
That's because json.loads (and json.load) does not decode multiple json object. For example, the json file you want may be: ["a": 1, "b": 2] however exactly the structure file of the code is: ["a": 1, "b": 2]["a": 1, "b": 2]
来源:https://stackoverflow.com/questions/26620714/json-loads-valueerror-extra-data-in-python