Suppose I need to have a database file consisting of a list of dictionaries:
file:
[
{\"name\":\"Joe\",\"data\":[1,2,3,4,5]},
{ ...
If you are looking to not actually load the file, going about this with json
is not really the right approach. You could use a memory mapped file… and never actually load the file to memory -- a memmap
array can open the file and build an array "on-disk" without loading anything into memory.
Create a memory-mapped array of dicts:
>>> import numpy as np
>>> a = np.memmap('mydict.dat', dtype=object, mode='w+', shape=(4,))
>>> a[0] = {'name':"Joe", 'data':[1,2,3,4]}
>>> a[1] = {'name':"Guido", 'data':[1,3,3,5]}
>>> a[2] = {'name':"Fernando", 'data':[4,2,6,9]}
>>> a[3] = {'name':"Jill", 'data':[9,1,9,0]}
>>> a.flush()
>>> del a
Now read the array, without loading the file:
>>> a = np.memmap('mydict.dat', dtype=object, mode='r')
The contents of the file are loaded into memory when the list is created, but that's not required -- you can work with the array on-disk without loading it.
>>> a.tolist()
[{'data': [1, 2, 3, 4], 'name': 'Joe'}, {'data': [1, 3, 3, 5], 'name': 'Guido'}, {'data': [4, 2, 6, 9], 'name': 'Fernando'}, {'data': [9, 1, 9, 0], 'name': 'Jill'}]
It takes a negligible amount of time (e.g. nanoseconds) to create a memory-mapped array that can index a file regardless of size (e.g. 100 GB) of the file.