Handling large dense matrices in python

前端 未结 6 1172
[愿得一人]
[愿得一人] 2021-02-04 21:33

Basically, what is the best way to go about storing and using dense matrices in python?

I have a project that generates similarity metrics between every item in an array

6条回答
  •  滥情空心
    2021-02-04 22:02

    If you have N objects, kept in a list L, and you wish to store the similarity betwen each object and each other object, that's O(N**2) similarities. Under the common conditions that similarity(A, B) == similarity(B, A) and similarity(A, A) == 0, all that you need is a triangular array S of similarities. The number of elements in that array will be N*(N-1)//2. You should be able to use an array.array for this purpose. Keeping your similarity as a float will take only 8 bytes. If you can represent your similarity as an integer in range(256), you use an unsigned byte as the array.array element.

    So that's about 8000 * 8000 / 2 * 8 i.e. about 256 MB. Using only a byte for the similarity means only 32 MB. You could avoid the slow S[i*N-i*(i+1)//2+j] index calculation of the triangle thingie by simulating a square array instead using S[i*N+j]`; memory would double to (512 MB for float, 64 MB for byte).

    If the above doesn't suit you, then perhaps you could explain """Each item [in which container?] is a custom class, and stores a pointer to the other class and a number representing it's "closeness" to that class."" and """I don't want to store numbers, I want to store reference to classes""". Even after replacing "class(es)" by "object(s)", I'm struggling to understand what you mean.

提交回复
热议问题