Java large datastructure for storing a matrix

后端未结

关注

 9  1963

I need to store a 2d matrix containing zip codes and the distance in km between each one of them. My client has an application that calculates the distances which are then s

相关标签:

9条回答

暖寄归人

2021-01-13 02:46
I would have thought that you could calculate the distances on the fly. Presumably someone has already done this, so you simply need to find out what algorithm they used, and the input data; e.g. longitude/latitude of the notional centres of each ZIP code.

EDIT: There are two commonly used algorithms for finding the (approximate) geodesic distance between two points given by longitude/latitude pairs.
- The Vicenty formula is based on an ellipsoid approximation. It is more accurate, but more complicated to implement.
- The Haversine formula is based on a spherical approximation. It is less accurate (0.3%), but simpler to implement.
0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2021-01-13 02:50

You will simply need more memory. When starting your Java process, kick it off like so:

java -Xmx256M MyClass

The -Xmx defines the max heap size, so this says the process can use up to 256 MB of memory for the heap. If you still run out, keep bumping that number up until you hit the physical limit.

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2021-01-13 02:56

The above suggestions regarding heap size will be helpful. However, I am not sure if you gave an accurate description of the size of your matrix.

Suppose you have 4 locations. Then you need to assess the distances between A->B, A->C, A->D, B->C, B->D, C->D. This suggests six entries in your HashMap (4 choose 2).

That would lead me to believe the actual optimal size of your HashMap is (952 choose 2)=452,676; NOT 952x952=906,304.

This is all assuming, of course, that you only store one-way relationships (i.e. from A->B, but not from B->A, since that is redundant), which I would recommend since you are already experiencing problems with memory space.

Edit: Should have said that the size of your matrix is not optimal, rather than saying the description was not accurate.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情深已故

2021-01-13 02:58
Can you simply boost the memory available to the JVM ?
```
java -Xmx512m ...
```
By default the maximum memory configuration is 64Mb. Some more tuning tips here. If you can do this then you can keep the data in-process and maximise the performance (i.e. you don't need to calculate on the fly).
0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-01-13 02:59

Create a new class with 2 slots for location names. Have it always put the alphabetically first name in the first slot. Give it a proper equals and hashcode method. Give it a compareTo (e.g. order alphabetically by names). Throw them all in an array. Sort it.

Also, hash1 = hash2 does not imply object1 = object2. Don't ever do this. It's a hack.

0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2021-01-13 03:05

Lately I've managed similar requisites for my master thesis.

I ended with a Matrix class that uses a double[], not a double[][], in order to alleviate double deref costs (data[i] that is an array, then array[i][j] that is a double) while allowing the VM to allocate a big, contiguous chunk of memory:

public class Matrix {

    private final double data[];
    private final int rows;
    private final int columns;

    public Matrix(int rows, int columns, double[][] initializer) {
        this.rows = rows;
        this.columns = columns;
        this.data = new double[rows * columns];

        int k = 0;

        for (int i = 0; i < initializer.length; i++) {
            System.arraycopy(initializer[i], 0, data, k, initializer[i].length);
            k += initializer[i].length;
        }
    }

    public Matrix set(int i, int j, double value) {
        data[j + i * columns] = value;
        return this;
    }

    public double get(int i, int j) {
        return data[j + i * columns];
    }
}

this class should use less memory than an HashMap since it uses a primitive array (no boxing needed): it needs only 906304 * 8 ~ 8 Mb (for doubles) or 906304 * 4 ~ 4 Mb (for floats). My 2 cents.

NB I've omitted some sanity checks for simplicity's sake

0 讨论(0)

1 2 下一页