How to format in numpy savetxt such that zeros are saved only as “0”

前端 未结 3 1181
难免孤独
难免孤独 2020-12-19 10:14

I am saving a numpy sparse array (densed) into a csv. The result is I have a 3GB csv. The problem is 95% of the cells are 0.0000. I used fmt=\'%5.4f\'

相关标签:
3条回答
  • 2020-12-19 10:52

    It would be much better if you saved only the non-zeros entries in your sparse matrix (m in the example below), you could achieve that doing:

    fname = 'row_col_data.txt'
    m = m.tocoo()
    a = np.vstack((m.row, m.col, m.data)).T
    header = '{0}, {1}'.format(*m.shape)
    np.savetxt(fname, a, header=header, fmt=('%d', '%d', '%5.4f'))
    

    and the sparse matrix can be recomposed doing:

    row, col, data = np.loadtxt(fname, skiprows=1, unpack=True)
    shape = map(int, open(fname).next()[1:].split(','))
    m = coo_matrix((data, (row, col)), shape=shape)
    
    0 讨论(0)
  • 2020-12-19 10:54

    Another simple option that may work given your requirements is the 'g' specifier. If you care more about significant digits and less about seeing exactly x number of digits and don't mind it switching between scientific and float, this does the trick well. For example:

    np.savetxt("foo.csv", arrayDense, fmt='%5.4g', delimiter=',') 
    

    If arrayDense is this:

    matrix([[ -5.54900000e-01,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   3.43560000e-08,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   3.43422000e+01]])
    

    Your way would yield:

    -0.5549,0.0000,0.0000
    0.0000,0.0000,0.0000
    0.0000,0.0000,34.3422
    

    The above would yield instead:

    -0.5549,    0,    0
    0,3.436e-08,    0
    0,    0,34.34
    

    This way is also more flexible. Notice that using 'g' instead of 'f', you don't lose data (i.e. 3.4356e-08 instead of 0.0000). This obviously is dependent on what you set your precision to however.

    0 讨论(0)
  • 2020-12-19 11:09

    If you look at the source code of np.savetxt, you'll see that, while there is quite a bit of code to handle the arguments and the differences between Python 2 and Python 3, it is ultimately a simple python loop over the rows, in which each row is formatted and written to the file. So you won't lose any performance if you write your own. For example, here's a pared down function that writes compact zeros:

    def savetxt_compact(fname, x, fmt="%.6g", delimiter=','):
        with open(fname, 'w') as fh:
            for row in x:
                line = delimiter.join("0" if value == 0 else fmt % value for value in row)
                fh.write(line + '\n')
    

    For example:

    In [70]: x
    Out[70]: 
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
           [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    
    In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')
    
    In [72]: !cat foo.csv
    0,0,0,0,1.2345
    0,9.8765,0,0,0
    0,3.1416,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    

    Then, as long as you are writing your own savetxt function, you might as well make it handle sparse matrices, so you don't have to convert it to a (dense) numpy array before saving it. (I assume the sparse array is implemented using one of the sparse representations from scipy.sparse.) In the following function, the only change is from ... for value in row to ... for value in row.A[0].

    def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','):
        with open(fname, 'w') as fh:
            for row in x:
                line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0])
                fh.write(line + '\n')
    

    Example:

    In [112]: a
    Out[112]: 
    <6x5 sparse matrix of type '<type 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>
    
    In [113]: a.A
    Out[113]: 
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
           [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    
    In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')
    
    In [115]: !cat foo.csv
    0,0,0,0,1.2345
    0,9.8765,0,0,0
    0,3.1416,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    
    0 讨论(0)
提交回复
热议问题