Java large datastructure for storing a matrix

后端 未结 9 1963
北恋
北恋 2021-01-13 02:17

I need to store a 2d matrix containing zip codes and the distance in km between each one of them. My client has an application that calculates the distances which are then s

相关标签:
9条回答
  • 2021-01-13 02:46

    I would have thought that you could calculate the distances on the fly. Presumably someone has already done this, so you simply need to find out what algorithm they used, and the input data; e.g. longitude/latitude of the notional centres of each ZIP code.

    EDIT: There are two commonly used algorithms for finding the (approximate) geodesic distance between two points given by longitude/latitude pairs.

    • The Vicenty formula is based on an ellipsoid approximation. It is more accurate, but more complicated to implement.

    • The Haversine formula is based on a spherical approximation. It is less accurate (0.3%), but simpler to implement.

    0 讨论(0)
  • 2021-01-13 02:50

    You will simply need more memory. When starting your Java process, kick it off like so:

    java -Xmx256M MyClass

    The -Xmx defines the max heap size, so this says the process can use up to 256 MB of memory for the heap. If you still run out, keep bumping that number up until you hit the physical limit.

    0 讨论(0)
  • 2021-01-13 02:56

    The above suggestions regarding heap size will be helpful. However, I am not sure if you gave an accurate description of the size of your matrix.

    Suppose you have 4 locations. Then you need to assess the distances between A->B, A->C, A->D, B->C, B->D, C->D. This suggests six entries in your HashMap (4 choose 2).

    That would lead me to believe the actual optimal size of your HashMap is (952 choose 2)=452,676; NOT 952x952=906,304.

    This is all assuming, of course, that you only store one-way relationships (i.e. from A->B, but not from B->A, since that is redundant), which I would recommend since you are already experiencing problems with memory space.

    Edit: Should have said that the size of your matrix is not optimal, rather than saying the description was not accurate.

    0 讨论(0)
  • 2021-01-13 02:58

    Can you simply boost the memory available to the JVM ?

    java -Xmx512m ...
    

    By default the maximum memory configuration is 64Mb. Some more tuning tips here. If you can do this then you can keep the data in-process and maximise the performance (i.e. you don't need to calculate on the fly).

    0 讨论(0)
  • 2021-01-13 02:59

    Create a new class with 2 slots for location names. Have it always put the alphabetically first name in the first slot. Give it a proper equals and hashcode method. Give it a compareTo (e.g. order alphabetically by names). Throw them all in an array. Sort it.

    Also, hash1 = hash2 does not imply object1 = object2. Don't ever do this. It's a hack.

    0 讨论(0)
  • 2021-01-13 03:05

    Lately I've managed similar requisites for my master thesis.

    I ended with a Matrix class that uses a double[], not a double[][], in order to alleviate double deref costs (data[i] that is an array, then array[i][j] that is a double) while allowing the VM to allocate a big, contiguous chunk of memory:

    public class Matrix {
    
        private final double data[];
        private final int rows;
        private final int columns;
    
        public Matrix(int rows, int columns, double[][] initializer) {
            this.rows = rows;
            this.columns = columns;
            this.data = new double[rows * columns];
    
            int k = 0;
    
            for (int i = 0; i < initializer.length; i++) {
                System.arraycopy(initializer[i], 0, data, k, initializer[i].length);
                k += initializer[i].length;
            }
        }
    
        public Matrix set(int i, int j, double value) {
            data[j + i * columns] = value;
            return this;
        }
    
        public double get(int i, int j) {
            return data[j + i * columns];
        }
    }
    

    this class should use less memory than an HashMap since it uses a primitive array (no boxing needed): it needs only 906304 * 8 ~ 8 Mb (for doubles) or 906304 * 4 ~ 4 Mb (for floats). My 2 cents.

    NB I've omitted some sanity checks for simplicity's sake

    0 讨论(0)
提交回复
热议问题