Given N > 0
and M > 0
, I want to enumerate all (x, y) pairs such that 1 <= x <= N and 1 <= y <= M in descending order of (x * y).
An
If you're just looking to save on space while retaining the time as more or less equal, you can count on the fact that each successively smaller element must be adjacent (in the 2-D grid you alluded to) to one of the elements you've already encountered. (You can prove this with induction, it's not particularly difficult. I'm going to assume for the rest of this that M>=N.)
The basic algorithm looks something like:
Start with a list (Enumerated Points) containing just the maximum element, M*N
Create a max heap (Candidates) containing (M-1),N and M,(N-1).
Repeat this:
1.Pick the largest value (x,y) in Candidates and append it to Enumerated Points
2.Add (x-1),y and x,(y-1) to Candidates if they are not there already
You can repeat this as long as you want more elements in Enumerated Points. The max size of Candidates should be M+N so I think this is O(k log(M+N)) where k is the number of points you want.
ADDENDUM: The matter of avoiding duplicate is not entirely difficult but is worth mentioning. I will assume in this algo that you lay your grid out so that numbers go down as you move down and right. Anyway, it goes like this:
At the beginning of the algorithm create an array (Column Size) which has one element for each column. You should make this array contain the number of rows in each column which are part of the list of enumerated points.
After you add a new element and update this array, you will check the size of the column on either side in order to decide if the grid squares to the immediate right and below of this new enumerated point are already in the candidates list.
Check the size of the column to the left- if it's larger than this one, you don't need to add the element below your new enumerated point.
Check the size of the column to the right- if it's one less than the same size of this column, you don't need to update the element to the right of this one.
To make this obvious, let's look at this partially completed chart for M=4, N=2:
4 3 2 1
* * * 2 |2
* 3 2 1 |1
The elements (4,2), (3,2), (2,2) and (4,1) are already in the list. (The first coordinate is M, the second is N.) The Column Size array is [2 1 1 0] since this is the number of items in each column that are in the Enumerated Points list. We are about to add (3,1) to the new list- We can look to the column size to the right and conclude that adding (2,1) isn't needed because the size of the column for M=2 is larger than 1-1. The reasoning is pretty clear visually - we already added (2,1) when we added (2,2).
In Haskell it produces the output immediately. Here's an illustration:
-------
-*------
-**------
-***------
-****------
-*****------
-******------
-*******------
Each starred point produces both (x,y) and (y,x). The algorithm "eats into" this thing from the top right corner, comparing the top elements in each column. The length of the frontier is never more than N
(we assume N >= M
).
enuNM n m | n<m = enuNM m n -- make sure N >= M
enuNM n m = let
b = [ [ (x*y,(x,y)) | y<- [m,m-1..1]] | x<-[n,n-1..m+1]]
a = [ (x*x,(x,x)) :
concat [ [(z,(x,y)),(z,(y,x))] -- two symmetrical pairs,
| y<- [x-1,x-2..1] -- below and above the diagonal
, let z=x*y ] | x<-[m,m-1..1]]
in
foldi (\(x:xs) ys-> x : merge xs ys) [] (b ++ a)
merge a@(x:xs) b@(y:ys) = if (fst y) > (fst x)
then y : merge a ys
else x : merge xs b
merge a [] = a
merge [] b = b
foldi f z [] = z
foldi f z (x:xs) = f x (foldi f z (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
pairs f t = t
foldi builds a skewed infinitely deepening tree serving as a heap, joining all the producer streams, each for each x
, which are created already sorted in descending order. Since all the initial values of producer streams are guaranteed to be in decreasing order, each initial value can be popped without comparison, allowing the tree to be built lazily.
The code for a
produces the pairs above diagonal line using the corresponding pairs from below the diagonal line (under the assumption N >= M
, for each (x,y)
where x <= M & y < x
, (y,x)
is also to be produced.)
It should be practically O(1) for each of the few first values produced which are very near the top of the tree of comparisons.
Prelude Main> take 10 $ map snd $ enuNM (2000) (3000) [(3000,2000),(2999,2000),(3000,1999),(2998,2000),(2999,1999),(3000,1998),(2997,2 000),(2998,1999),(2999,1998),(2996,2000)] (0.01 secs, 1045144 bytes) Prelude Main> let xs=take 10 $ map (log.fromIntegral.fst) $ enuNM (2000) (3000) Prelude Main> zipWith (>=) xs (tail xs) [True,True,True,True,True,True,True,True,True] Prelude Main> take 10 $ map snd $ enuNM (2*10^8) (3*10^8) [(300000000,200000000),(299999999,200000000),(300000000,199999999),(299999998,20 0000000),(299999999,199999999),(300000000,199999998),(299999997,200000000),(2999 99998,199999999),(299999999,199999998),(299999996,200000000)] (0.01 secs, 2094232 bytes)
We can assess the empirical run-time complexity:
Prelude Main> take 10 $ drop 50000 $ map (log.fromIntegral.fst) $ enuNM (2*10^8) (3*10^8) [38.633119670465554,38.633119670465554,38.63311967046555,38.63311967046554,38.63 311967046554,38.63311967046553,38.63311967046553,38.633119670465526,38.633119670 465526,38.63311967046552] (0.17 secs, 35425848 bytes) Prelude Main> take 10 $ drop 100000 $ map (log.fromIntegral.fst) $ enuNM (2*10^8 ) (3*10^8) [38.63311913546512,38.633119135465115,38.633119135465115,38.63311913546511,38.63 311913546511,38.6331191354651,38.6331191354651,38.633119135465094,38.63311913546 5094,38.63311913546509] (0.36 secs, 71346352 bytes) *Main> let x=it *Main> zipWith (>=) x (tail x) [True,True,True,True,True,True,True,True,True] Prelude Main> logBase 2 (0.36/0.17) 1.082462160191973 -- O(n^1.08) for n=100000 values produced
This can be translated into e.g. Python using generators for Haskell streams in a strightforward manner as can be seen here.
This is effectively equivalent to enumerating the prime numbers; the numbers you want are all the numbers that aren't prime (except for all those that have x
or y
equal to 1).
I'm not sure there's a method of enumerating primes that's going to be quicker than what you're already proposing (at least in terms of algorithmic complexity).
A dummy approach that loops from NxM to 1 searching for pairs that when multiplied produce the current number:
#!/usr/bin/perl
my $n = 5;
my $m = 4;
for (my $p = $n * $m; $p > 0; $p--) {
my $min_x = int(($p + $m - 1) / $m);
for my $x ($min_x..$n) {
if ($p % $x == 0) {
my $y = $p / $x;
print("x: $x, y: $y, p: $p\n");
}
}
}
For N=M, complexity is O(N3) but memory usage is O(1).
Update: Note that the complexity is not as bad as it seems because the number of elements to generate is already N2. For comparison, the generate-all-the-pairs-and-sort approach is O(N2logN) with O(N2) memory usage.
I got it!
Consider the grid as a set of M columns where every column is a stack containing the elements from 1 at the bottom to N at the top. Every column is tagged with its x coordinate.
The elements inside every column stack are ordered by its y value and so also by x*y as x has the same value for all of them.
So, you just have to go picking the stack that has the bigger x*y value at its top, pop it and repeat.
In practice you will not need stacks, just the index of the top value and you can use a priority queue to get the column with the bigger x*y value. Then, decrement the value of the index and if it is bigger than 0 (indicating that the stack has not been exhausted) reinsert the stack on the queue with its new priority x*y.
The complexity of this algorithm for N=M is O(N2logN) and its memory usage O(N).
Update: Implemented in Perl...
use Heap::Simple;
my ($m, $n) = @ARGV;
my $h = Heap::Simple->new(order => '>', elements => [Hash => 'xy']);
# The elements in the heap are hashes and the priority is in the slot 'xy':
for my $x (1..$m) {
$h->insert({ x => $x, y => $n, xy => $x * $n });
}
while (defined (my $col = $h->extract_first)) {
print "x: $col->{x}, y: $col->{y}, xy: $col->{xy}\n";
if (--$col->{y}) {
$col->{xy} = $col->{x} * $col->{y};
$h->insert($col);
}
}
Because you mention that most of the time you need the first few terms of the sequence; after generating them all you don't need to sort them all to find these first few terms. You can use a Max Heap depending on number of terms that you want, say k. So if a heap is of size k (<< N && << M) then you can have largest k terms after nlogk which is better than nlogn for sorting.
Here n = N*M