Fastest way to determine if an integer's square root is an integer

前端 未结 30 1667
心在旅途
心在旅途 2020-11-22 02:17

I\'m looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):

  1. I\'ve done it the ea
相关标签:
30条回答
  • 2020-11-22 02:59

    This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.

    • Mod-256 test
    • Inexact mod-3465 test (avoids integer division at the cost of some false positives)
    • Floating-point square root, round and compare with input value

    I also experimented with these modifications but they did not help performance:

    • Additional mod-255 test
    • Dividing the input value by powers of 4
    • Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)

    public class SquareTester {
    
        public static boolean isPerfectSquare(long n) {
            if (n < 0) {
                return false;
            } else {
                switch ((byte) n) {
                case -128: case -127: case -124: case -119: case -112:
                case -111: case -103: case  -95: case  -92: case  -87:
                case  -79: case  -71: case  -64: case  -63: case  -60:
                case  -55: case  -47: case  -39: case  -31: case  -28:
                case  -23: case  -15: case   -7: case    0: case    1:
                case    4: case    9: case   16: case   17: case   25:
                case   33: case   36: case   41: case   49: case   57:
                case   64: case   65: case   68: case   73: case   81:
                case   89: case   97: case  100: case  105: case  113:
                case  121:
                    long i = (n * INV3465) >>> 52;
                    if (! good3465[(int) i]) {
                        return false;
                    } else {
                        long r = round(Math.sqrt(n));
                        return r*r == n; 
                    }
                default:
                    return false;
                }
            }
        }
    
        private static int round(double x) {
            return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
        }
    
        /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
        private static final long INV3465 = 0x8ffed161732e78b9L;
    
        private static final boolean[] good3465 =
            new boolean[0x1000];
    
        static {
            for (int r = 0; r < 3465; ++ r) {
                int i = (int) ((r * r * INV3465) >>> 52);
                good3465[i] = good3465[i+1] = true;
            }
        }
    
    }
    
    0 讨论(0)
  • 2020-11-22 02:59

    I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.

    Just replace your:

    if(n < 410881L){...}
    

    code with this one:

    if (n < 11043908100L) {
        //John Carmack hack, converted to Java.
        // See: http://www.codemaestro.com/reviews/9
        int i;
        float x2, y;
    
        x2 = n * 0.5F;
        y = n;
        i = Float.floatToRawIntBits(y);
        //using the magic number from 
        //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
        //since it more accurate
        i = 0x5f375a86 - (i >> 1);
        y = Float.intBitsToFloat(i);
        y = y * (1.5F - (x2 * y * y));
        y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
    
        sqrt = Math.round(1.0F / y);
    } else {
        //Carmack hack gives incorrect answer for n >= 11043908100.
        sqrt = (long) Math.sqrt(n);
    }
    
    0 讨论(0)
  • 2020-11-22 03:03

    I was thinking about the horrible times I've spent in Numerical Analysis course.

    And then I remember, there was this function circling around the 'net from the Quake Source code:

    float Q_rsqrt( float number )
    {
      long i;
      float x2, y;
      const float threehalfs = 1.5F;
    
      x2 = number * 0.5F;
      y  = number;
      i  = * ( long * ) &y;  // evil floating point bit level hacking
      i  = 0x5f3759df - ( i >> 1 ); // wtf?
      y  = * ( float * ) &i;
      y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
      // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
    
      #ifndef Q3_VM
      #ifdef __linux__
        assert( !isnan(y) ); // bk010122 - FPE?
      #endif
      #endif
      return y;
    }
    

    Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

    It should be usable and might even be faster, it's from one of the phenomenal id software's game!

    It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

    I originally found it at: http://www.codemaestro.com/reviews/9

    Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

    You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

    • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
    • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
    • the * (float*) &i converts the value back to floating point.
    • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

    The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

    This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.

    0 讨论(0)
  • 2020-11-22 03:05

    I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.

    My approach is threefold:

    1. First, filter out obvious answers. This includes negative numbers and looking at the last 4 bits. (I found looking at the last six didn't help.) I also answer yes for 0. (In reading the code below, note that my input is int64 x.)
      if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
          return false;
      if( x == 0 )
          return true;
    2. Next, check if it's a square modulo 255 = 3 * 5 * 17. Because that's a product of three distinct primes, only about 1/8 of the residues mod 255 are squares. However, in my experience, calling the modulo operator (%) costs more than the benefit one gets, so I use bit tricks involving 255 = 2^8-1 to compute the residue. (For better or worse, I am not using the trick of reading individual bytes out of a word, only bitwise-and and shifts.)
      int64 y = x;
      y = (y & 4294967295LL) + (y >> 32); 
      y = (y & 65535) + (y >> 16);
      y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
      // At this point, y is between 0 and 511.  More code can reduce it farther.
      
      To actually check if the residue is a square, I look up the answer in a precomputed table.
      if( bad255[y] )
          return false;
      // However, I just use a table of size 512
      
    3. Finally, try to compute the square root using a method similar to Hensel's lemma. (I don't think it's applicable directly, but it works with some modifications.) Before doing that, I divide out all powers of 2 with a binary search:
      if((x & 4294967295LL) == 0)
          x >>= 32;
      if((x & 65535) == 0)
          x >>= 16;
      if((x & 255) == 0)
          x >>= 8;
      if((x & 15) == 0)
          x >>= 4;
      if((x & 3) == 0)
          x >>= 2;
      At this point, for our number to be a square, it must be 1 mod 8.
      if((x & 7) != 1)
          return false;
      The basic structure of Hensel's lemma is the following. (Note: untested code; if it doesn't work, try t=2 or 8.)
      int64 t = 4, r = 1;
      t <<= 1; r += ((x - r * r) & t) >> 1;
      t <<= 1; r += ((x - r * r) & t) >> 1;
      t <<= 1; r += ((x - r * r) & t) >> 1;
      // Repeat until t is 2^33 or so.  Use a loop if you want.
      The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:
      int64 r, t, z;
      r = start[(x >> 3) & 1023];
      do {
          z = x - r * r;
          if( z == 0 )
              return true;
          if( z < 0 )
              return false;
          t = z & (-z);
          r += (z & t) >> 1;
          if( r > (t >> 1) )
              r = t - r;
      } while( t <= (1LL << 33) );
      The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at z = r - x * x, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.

    Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.

    typedef signed long long int int64;
    
    int start[1024] =
    {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
    1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
    129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
    1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
    257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
    1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
    385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
    1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
    513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
    1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
    641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
    1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
    769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
    1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
    897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
    1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
    1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
    959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
    1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
    831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
    1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
    703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
    1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
    575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
    1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
    447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
    1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
    319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
    1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
    191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
    1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
    63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
    2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
    65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
    1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
    193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
    1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
    321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
    1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
    449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
    1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
    577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
    1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
    705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
    1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
    833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
    1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
    961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
    1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
    1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
    895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
    1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
    767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
    1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
    639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
    1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
    511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
    1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
    383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
    1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
    255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
    1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
    127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
    1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};
    
    bool bad255[512] =
    {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
     1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
     0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
     1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
     1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
     1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
     1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
     1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
     0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
     1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
     0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
     1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
     1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
     1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
     1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
     1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
     0,0};
    
    inline bool square( int64 x ) {
        // Quickfail
        if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
            return false;
        if( x == 0 )
            return true;
    
        // Check mod 255 = 3 * 5 * 17, for fun
        int64 y = x;
        y = (y & 4294967295LL) + (y >> 32);
        y = (y & 65535) + (y >> 16);
        y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
        if( bad255[y] )
            return false;
    
        // Divide out powers of 4 using binary search
        if((x & 4294967295LL) == 0)
            x >>= 32;
        if((x & 65535) == 0)
            x >>= 16;
        if((x & 255) == 0)
            x >>= 8;
        if((x & 15) == 0)
            x >>= 4;
        if((x & 3) == 0)
            x >>= 2;
    
        if((x & 7) != 1)
            return false;
    
        // Compute sqrt using something like Hensel's lemma
        int64 r, t, z;
        r = start[(x >> 3) & 1023];
        do {
            z = x - r * r;
            if( z == 0 )
                return true;
            if( z < 0 )
                return false;
            t = z & (-z);
            r += (z & t) >> 1;
            if( r > (t  >> 1) )
                r = t - r;
        } while( t <= (1LL << 33) );
    
        return false;
    }
    0 讨论(0)
  • 2020-11-22 03:05

    I want this function to work with all positive 64-bit signed integers

    Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.

    0 讨论(0)
  • 2020-11-22 03:06

    Calculating square roots by Newton's method is horrendously fast ... provided that the starting value is reasonable. However there is no reasonable starting value, and in practice we end with bisection and log(2^64) behaviour.
    To be really fast we need a fast way to get at a reasonable starting value, and that means we need to descend into machine language. If a processor provides an instruction like POPCNT in the Pentium, that counts the leading zeroes we can use that to have a starting value with half the significant bits. With care we can find a a fixed number of Newton steps that will always suffice. (Thus foregoing the need to loop and have very fast execution.)

    A second solution is going via the floating point facility, which may have a fast sqrt calculation (like the i87 coprocessor.) Even an excursion via exp() and log() may be faster than Newton degenerated into a binary search. There is a tricky aspect to this, a processor dependant analysis of what and if refinement afterwards is necessary.

    A third solution solves a slightly different problem, but is well worth mentionning because the situation is described in the question. If you want to calculate a great many square roots for numbers that differ slightly, you can use Newton iteration, if you never reinitialise the starting value, but just leave it where the previous calculation left off. I've used this with success in at least one Euler problem.

    0 讨论(0)
提交回复
热议问题