NumPy arrays with SQLite

后端 未结 4 627
孤城傲影
孤城傲影 2021-02-04 19:39

The most common SQLite interface I\'ve seen in Python is sqlite3, but is there anything that works well with NumPy arrays or recarrays? By that I mean one that reco

相关标签:
4条回答
  • 2021-02-04 19:50

    Doug's suggestion with redis is quite good, but I think his code is a bit complicated and, as a result, rather slow. For my purposes, I had to serialize+write and then grab+deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:

    For writing:

    snapshot = np.random.randn(1024,1024)
    serialized = snapshot.tobytes()
    rs.set('snapshot_key', serialized)
    

    Then for reads:

    s = rs.get('snapshot_key')
    deserialized = np.frombuffer(s).astype(np.float32)
    rank = np.sqrt(deserialized.size).astype(int)
    snap = deserialized(rank, rank)
    

    You can do some basic performance testing with ipython using %time, but neither the tobytes or frombuffer take more than a few milliseconds.

    0 讨论(0)
  • 2021-02-04 19:54

    This looks a bit older but is there any reason you cannot just do a fetchall() instead of iterating and then just initializing numpy on declaration?

    0 讨论(0)
  • 2021-02-04 19:58

    why not give redis a try?

    Drivers for your two platforms of interest are available--python (redis, via package index]2), and R (rredis, CRAN).

    The genius of redis is not that it will magically recognize the NumPy data type and allow you to insert and extract multi-dimensional NumPy arrays as if they were native redis datatypes, rather its genius is in the remarkable ease with which you can create such an interface with just a few lines of code.

    There are (at least) several tutorials on redis in python; the one on the DeGizmo blog is particularly good.

    import numpy as NP
    
    # create some data
    A = NP.random.randint(0, 10, 40).reshape(8, 5)
    
    # a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
    # into redis db for more compact storage & 
    # (ii) to restore the original NumPy data types upon retrieval from redis db
    fnx2 = lambda v : map(int, list(v))
    fnx = lambda v : ''.join(map(str, v))
    
    # start the redis server (e.g. from a bash prompt)
    $> cd /usr/local/bin      # default install directory for 'nix
    $> redis-server           # starts the redis server
    
    # start the redis client:
    from redis import Redis
    r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()
    
    # to insert items using redis 'string' datatype, call 'set' on the database, r0, and
    # just pass in a key, and the item to insert
    r0.set('k1', A[0,:])
    
    # row-wise insertion the 2D array into redis, iterate over the array:
    for c in range(A.shape[0]):
        r0.set( "k{0}".format(c), fnx(A[c,:]) )
    
    # or to insert all rows at once
    # use 'mset' ('multi set') and pass in a key-value mapping: 
    x = dict([sublist for sublist in enumerate(A.tolist())])
    r0.mset(x1)
    
    # to retrieve a row, pass its key to 'get'
    >>> r0.get('k0')
      '63295'
    
    # retrieve the entire array from redis:
    kx = r0.keys('*')           # returns all keys in redis database, r0
    
    for key in kx :
        r0.get(key)
    
    # to retrieve it in original form:
    A = []
    for key in kx:
        A.append(fnx2(r0.get("{0}".format(key))))
    
    >>> A = NP.array(A)
    >>> A
      array([[ 6.,  2.,  3.,  3.,  9.],
             [ 4.,  9.,  6.,  2.,  3.],
             [ 3.,  7.,  9.,  5.,  0.],
             [ 5.,  2.,  6.,  3.,  4.],
             [ 7.,  1.,  5.,  0.,  2.],
             [ 8.,  6.,  1.,  5.,  8.],
             [ 1.,  7.,  6.,  4.,  9.],
             [ 6.,  4.,  1.,  3.,  6.]])
    
    0 讨论(0)
  • 2021-02-04 20:11

    I found at least three Python packages to interface SQLite and NumPy:

    • esutil.sqlite_util, which is part of esutil
    • hydroclimpy.io.sqlite
    • my own RecSQL (available from github.com/orbeckst/RecSQL)

    Each of these packages has to deal with the problem that SQLite (by default) only understands standard Python types and not the NumPy data types such as numpy.int64.

    RecSQL 0.7.8+ works for me (most of the time) but I consider it a pretty bad hack and glancing over the code, esutil.sqlite_util appears to be more mature.

    0 讨论(0)
提交回复
热议问题