Simplification Algorithm for Reverse Polish Notation

后端 未结 4 764
温柔的废话
温柔的废话 2021-01-16 11:34

A couple of days ago I played around with Befunge which is an esoteric programming language. Befunge uses a LIFO stack to store data. When you write programs the digits fro

4条回答
  •  一整个雨季
    2021-01-16 12:15

    When only considering multiplication and addition, it's pretty easy to construct optimal formula's, because that problem has the optimal substructure property. That is, the optimal way to build [num1][num2]op is from num1 and num2 that are both also optimal. If duplication is also considered, that's no longer true.

    The num1 and num2 give rise to overlapping subproblems, so Dynamic Programming is applicable.

    We can simply, for a number i:

    1. For every 1 < j <= sqrt(i) that evenly divides i, try [j][i / j]*
    2. For every 0 < j < i/2, try [j][i - j]+
    3. Take the best found formula

    That is of course very easy to do bottom-up, just start at i = 0 and work your way up to whatever number you want. Step 2 is a little slow, unfortunately, so after say 100000 it starts to get annoying to wait for it. There might be some trick that I'm not seeing.

    Code in C# (not tested super well, but it seems to work):

    string[] n = new string[10000];
    for (int i = 0; i < 10; i++)
        n[i] = "" + i;
    for (int i = 10; i < n.Length; i++)
    {
        int bestlen = int.MaxValue;
        string best = null;
        // try factors
        int sqrt = (int)Math.Sqrt(i);
        for (int j = 2; j <= sqrt; j++)
        {
            if (i % j == 0)
            {
                int len = n[j].Length + n[i / j].Length + 1;
                if (len < bestlen)
                {
                    bestlen = len;
                    best = n[j] + n[i / j] + "*";
                }
            }
        }
        // try sums
        for (int j = 1; j < i / 2; j++)
        {
            int len = n[j].Length + n[i - j].Length + 1;
            if (len < bestlen)
            {
                bestlen = len;
                best = n[j] + n[i - j] + "+";
            }
        }
        n[i] = best;
    }
    

    Here's a trick to optimize searching for the sums. Suppose there is an array that contains, for every length, the highest number that can be made with that length. An other thing that is perhaps less obvious that this array also gives us, is a quick way to determine the shortest number that is bigger than some threshold (by simply scanning through the array and noting the first position that crosses the threshold). Together, that gives a quick way to discard huge portions of the search space.

    For example, the biggest number of length 3 is 81 and the biggest number of length 5 is 728. Now if we want to know how to get 1009 (prime, so no factors found), first we try the sums where the first part has length 1 (so 1+1008 through 9+1000), finding 9+1000 which is 9 characters long (95558***+).

    The next step, checking the sums where the first part has length 3 or less, can be skipped completely. 1009 - 81 = 929, and 929 (the lowest that the second part of the sum can be if the first part is to be 3 characters or less) is bigger than 728 so numbers of 929 and over must be at least 7 characters long. So if the first part of the sum is 3 characters, the second part must be at least 7 characters, and then there's also a + sign on the end, so the total is at least 11 characters. The best so far was 9, so this step can be skipped.

    The next step, with 5 characters in the first part, can also be skipped, because 1009 - 728 = 280, and to make 280 or high we need at least 5 characters. 5 + 5 + 1 = 11, bigger than 9, so don't check.

    Instead of checking about 500 sums, we only had to check 9 this way, and the check to make the skipping possible is very quick. This trick is good enough that generating all numbers up to a million only takes 3 seconds on my PC (before, it would take 3 seconds to get to 100000).

    Here's the code:

    string[] n = new string[100000];
    int[] biggest_number_of_length = new int[n.Length];
    for (int i = 0; i < 10; i++)
        n[i] = "" + i;
    biggest_number_of_length[1] = 9;
    for (int i = 10; i < n.Length; i++)
    {
        int bestlen = int.MaxValue;
        string best = null;
        // try factors
        int sqrt = (int)Math.Sqrt(i);
        for (int j = 2; j <= sqrt; j++)
        {
            if (i % j == 0)
            {
                int len = n[j].Length + n[i / j].Length + 1;
                if (len < bestlen)
                {
                    bestlen = len;
                    best = n[j] + n[i / j] + "*";
                }
            }
        }
        // try sums
        for (int x = 1; x < bestlen; x += 2)
        {
            int find = i - biggest_number_of_length[x];
            int min = int.MaxValue;
            // find the shortest number that is >= (i - biggest_number_of_length[x])
            for (int k = 1; k < biggest_number_of_length.Length; k += 2)
            {
                if (biggest_number_of_length[k] >= find)
                {
                    min = k;
                    break;
                }
            }
            // if that number wasn't small enough, it's not worth looking in that range
            if (min + x + 1 < bestlen)
            {
                // range [find .. i] isn't optimal
                for (int j = find; j < i; j++)
                {
                    int len = n[i - j].Length + n[j].Length + 1;
                    if (len < bestlen)
                    {
                        bestlen = len;
                        best = n[i - j] + n[j] + "+";
                    }
                }
            }
        }
        // found
        n[i] = best;
        biggest_number_of_length[bestlen] = i;
    }
    

    There's still room for improvement. This code will re-check sums that it has already checked. There are simple ways to make it at least not check the same sum twice (by remembering the last find), but that made no significant difference in my tests. It should be possible to find a better upper bound.

提交回复
热议问题