If the file contains one large JSON object (either array or map), then per the JSON spec, you must read the entire object before you can access its components.
If for instance the file is an array with objects [ {...}, {...} ]
then newline delimited JSON is far more efficient since you only have to keep one object in memory at a time and the parser only has to read one line before it can begin processing.
If you need to keep track of some of the objects for later use during parsing, I suggest creating a dict
to hold those specific records of running values as your iterate the file.
Say you have JSON
{"timestamp": 1549480267882, "sensor_val": 1.6103881016325283}
{"timestamp": 1549480267883, "sensor_val": 9.281329310309406}
{"timestamp": 1549480267883, "sensor_val": 9.357327083443344}
{"timestamp": 1549480267883, "sensor_val": 6.297722749124474}
{"timestamp": 1549480267883, "sensor_val": 3.566667175421604}
{"timestamp": 1549480267883, "sensor_val": 3.4251473635178655}
{"timestamp": 1549480267884, "sensor_val": 7.487766674770563}
{"timestamp": 1549480267884, "sensor_val": 8.701853236245032}
{"timestamp": 1549480267884, "sensor_val": 1.4070662393018396}
{"timestamp": 1549480267884, "sensor_val": 3.6524325449499995}
{"timestamp": 1549480455646, "sensor_val": 6.244199614422415}
{"timestamp": 1549480455646, "sensor_val": 5.126780276231609}
{"timestamp": 1549480455646, "sensor_val": 9.413894020722314}
{"timestamp": 1549480455646, "sensor_val": 7.091154829208067}
{"timestamp": 1549480455647, "sensor_val": 8.806417239029447}
{"timestamp": 1549480455647, "sensor_val": 0.9789474417767674}
{"timestamp": 1549480455647, "sensor_val": 1.6466189633300243}
You can process this with
import json
from collections import deque
# RingBuffer from https://www.daniweb.com/programming/software-development/threads/42429/limit-size-of-a-list
class RingBuffer(deque):
def __init__(self, size):
deque.__init__(self)
self.size = size
def full_append(self, item):
deque.append(self, item)
# full, pop the oldest item, left most item
self.popleft()
def append(self, item):
deque.append(self, item)
# max size reached, append becomes full_append
if len(self) == self.size:
self.append = self.full_append
def get(self):
"""returns a list of size items (newest items)"""
return list(self)
def proc_data():
# Declare some state management in memory to keep track of whatever you want
# as you iterate through the objects
metrics = {
'latest_timestamp': 0,
'last_3_samples': RingBuffer(3)
}
with open('test.json', 'r') as infile:
for line in infile:
# Load each line
line = json.loads(line)
# Do stuff with your running metrics
metrics['last_3_samples'].append(line['sensor_val'])
if line['timestamp'] > metrics['latest_timestamp']:
metrics['latest_timestamp'] = line['timestamp']
return metrics
print proc_data()