Fastest way to determine if an integer's square root is an integer

前端 未结 30 1718
心在旅途
心在旅途 2020-11-22 02:17

I\'m looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):

  1. I\'ve done it the ea
30条回答
  •  梦毁少年i
    2020-11-22 02:39

    I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:

    // This is faster because a number is divisible by 2^4 or more only 6% of the time
    // and more than that a vanishingly small percentage.
    while((x & 0x3) == 0) x >>= 2;
    // This is effectively the same as the switch-case statement used in the original
    // answer. 
    if((x & 0x7) != 1) return false;
    

    However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the switch-case statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.

    The algorithms below are as follows:

    • Internet - Kip's posted answer
    • Durron - My modified answer using the one-pass answer as a base
    • DurronTwo - My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.

    Here is a sample runtime if the numbers are generated using Math.abs(java.util.Random.nextLong())

     0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials
    33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials
    67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials
    
    benchmark   us linear runtime
     Internet 39.7 ==============================
       Durron 37.8 ============================
    DurronTwo 36.0 ===========================
    
    vm: java
    trial: 0
    

    And here is a sample runtime if it's run on the first million longs only:

     0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials
    33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials
    67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials
    
    benchmark   ms linear runtime
     Internet 2.93 ===========================
       Durron 2.24 =====================
    DurronTwo 3.16 ==============================
    
    vm: java
    trial: 0
    

    As you can see, DurronTwo does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and Math.sqrt because the numbers are so much smaller. Meanwhile, the simpler Durron is a huge winner because it never has to divide by 4 many many times in the first million numbers.

    Here's Durron:

    public final static boolean isPerfectSquareDurron(long n) {
        if(n < 0) return false;
        if(n == 0) return true;
    
        long x = n;
        // This is faster because a number is divisible by 16 only 6% of the time
        // and more than that a vanishingly small percentage.
        while((x & 0x3) == 0) x >>= 2;
        // This is effectively the same as the switch-case statement used in the original
        // answer. 
        if((x & 0x7) == 1) {
    
            long sqrt;
            if(x < 410881L)
            {
                int i;
                float x2, y;
    
                x2 = x * 0.5F;
                y  = x;
                i  = Float.floatToRawIntBits(y);
                i  = 0x5f3759df - ( i >> 1 );
                y  = Float.intBitsToFloat(i);
                y  = y * ( 1.5F - ( x2 * y * y ) );
    
                sqrt = (long)(1.0F/y);
            } else {
                sqrt = (long) Math.sqrt(x);
            }
            return sqrt*sqrt == x;
        }
        return false;
    }
    

    And DurronTwo

    public final static boolean isPerfectSquareDurronTwo(long n) {
        if(n < 0) return false;
        // Needed to prevent infinite loop
        if(n == 0) return true;
    
        long x = n;
        while((x & 0x3) == 0) x >>= 2;
        if((x & 0x7) == 1) {
            long sqrt;
            if (x < 41529141369L) {
                int i;
                float x2, y;
    
                x2 = x * 0.5F;
                y = x;
                i = Float.floatToRawIntBits(y);
                //using the magic number from 
                //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
                //since it more accurate
                i = 0x5f375a86 - (i >> 1);
                y = Float.intBitsToFloat(i);
                y = y * (1.5F - (x2 * y * y));
                y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
                sqrt = (long) ((1.0F/y) + 0.2);
            } else {
                //Carmack hack gives incorrect answer for n >= 41529141369.
                sqrt = (long) Math.sqrt(x);
            }
            return sqrt*sqrt == x;
        }
        return false;
    }
    

    And my benchmark harness: (Requires Google caliper 0.1-rc5)

    public class SquareRootBenchmark {
        public static class Benchmark1 extends SimpleBenchmark {
            private static final int ARRAY_SIZE = 10000;
            long[] trials = new long[ARRAY_SIZE];
    
            @Override
            protected void setUp() throws Exception {
                Random r = new Random();
                for (int i = 0; i < ARRAY_SIZE; i++) {
                    trials[i] = Math.abs(r.nextLong());
                }
            }
    
    
            public int timeInternet(int reps) {
                int trues = 0;
                for(int i = 0; i < reps; i++) {
                    for(int j = 0; j < ARRAY_SIZE; j++) {
                        if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++;
                    }
                }
    
                return trues;   
            }
    
            public int timeDurron(int reps) {
                int trues = 0;
                for(int i = 0; i < reps; i++) {
                    for(int j = 0; j < ARRAY_SIZE; j++) {
                        if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++;
                    }
                }
    
                return trues;   
            }
    
            public int timeDurronTwo(int reps) {
                int trues = 0;
                for(int i = 0; i < reps; i++) {
                    for(int j = 0; j < ARRAY_SIZE; j++) {
                        if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++;
                    }
                }
    
                return trues;   
            }
        }
    
        public static void main(String... args) {
            Runner.main(Benchmark1.class, args);
        }
    }
    

    UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo 0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:

    if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
    

    The resulting index is either 1) the residue, 2) the residue + 0xFFFFFF, or 3) the residue + 0x1FFFFFE. Of course, we need to have a lookup table for residues modulo 0xFFFFFF, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a ByteBuffer and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):

    public final static boolean isPerfectSquareDurronThree(long n) {
        if(n < 0) return false;
        if(n == 0) return true;
    
        long x = n;
        while((x & 0x3) == 0) x >>= 2;
        if((x & 0x7) == 1) {
            if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
            long sqrt;
            if(x < 410881L)
            {
                int i;
                float x2, y;
    
                x2 = x * 0.5F;
                y  = x;
                i  = Float.floatToRawIntBits(y);
                i  = 0x5f3759df - ( i >> 1 );
                y  = Float.intBitsToFloat(i);
                y  = y * ( 1.5F - ( x2 * y * y ) );
    
                sqrt = (long)(1.0F/y);
            } else {
                sqrt = (long) Math.sqrt(x);
            }
            return sqrt*sqrt == x;
        }
        return false;
    }
    

    I load it into a boolean array like this:

    private static boolean[] goodLookupSquares = null;
    
    public static void initGoodLookupSquares() throws Exception {
        Scanner s = new Scanner(new File("24residues_squares.txt"));
    
        goodLookupSquares = new boolean[0x1FFFFFE];
    
        while(s.hasNextLine()) {
            int residue = Integer.valueOf(s.nextLine());
            goodLookupSquares[residue] = true;
            goodLookupSquares[residue + 0xFFFFFF] = true;
            goodLookupSquares[residue + 0x1FFFFFE] = true;
        }
    
        s.close();
    }
    

    Example runtime. It beat Durron (version one) in every trial I ran.

     0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials
    33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials
    67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials
    
      benchmark   us linear runtime
       Internet 40.7 ==============================
         Durron 38.4 ============================
    DurronThree 36.2 ==========================
    
    vm: java
    trial: 0
    

提交回复
热议问题