Algorithm to identify a unique free polyomino (or polyomino hash)

前端 未结 6 547
心在旅途
心在旅途 2021-02-02 16:13

In short: How to hash a free polyomino?

This could be generalized into: How to efficiently hash an arbitrary collection of 2D integer coordinate

相关标签:
6条回答
  • 2021-02-02 16:27

    You can set up something like a trie to uniquely identify (and not just hash) your polyomino. Take your normalized polyomino and set up a binary search tree, where the root branches on whether (0,0) is has a set pixel, the next level branches on whether (0,1) has a set pixel, and so on. When you look up a polyomino, simply normalize it and then walk the tree. If you find it in the trie, then you're done. If not, assign that polyomino a unique id (just increment a counter), generate all 8 possible rotations and flips, then add those 8 to the trie.

    On a trie miss, you'll have to generate all the rotations and reflections. But on a trie hit it should cost less (O(k^2) for k-polyominos).

    To make lookups even more efficient, you could use a couple bits at a time and use a wider tree instead of a binary tree.

    0 讨论(0)
  • 2021-02-02 16:33

    You can reduce it down to 8 hash operations without the need to flip, rotate, or re-translate.

    Note that this algorithm assumes you are operating with coordinates relative to itself. That is to say it's not in the wild.

    Instead of applying operations that flip, rotate, and translate, instead simply change the order in which you hash.

    For instance, let us take the F pent above. In the simple example, let us presume the hash operation was something like this:

    int hashPolySingle(Poly p)
        int hash = 0
        for x = 0 to p.width
            fory = 0 to p.height
                hash = hash * 31 + p.contains(x,y) ? 1 : 0
        hashPolySingle = hash
    
    int hashPoly(Poly p)
        int hash = hashPolySingle(p)
        p.rotateClockwise() // assume it translates inside
        hash = hash * 31 + hashPolySingle(p)
        // keep rotating for all 4 oritentations
        p.flip()
        // hash those 4
    

    Instead of applying the function to all 8 different orientations of the poly, I would apply 8 different hash functions to 1 poly.

    int hashPolySingle(Poly p, bool flip, int corner)
        int hash = 0
        int xstart, xstop, ystart, ystop
        bool yfirst
        switch(corner)
            case 1: xstart = 0
                    xstop = p.width
                    ystart = 0
                    ystop = p.height
                    yfirst = false
                    break
            case 2: xstart = p.width
                    xstop = 0
                    ystart = 0
                    ystop = p.height
                    yfirst = true
                    break
            case 3: xstart = p.width
                    xstop = 0
                    ystart = p.height
                    ystop = 0
                    yfirst = false
                    break
            case 4: xstart = 0
                    xstop = p.width
                    ystart = p.height
                    ystop = 0
                    yfirst = true
                    break
            default: error()
        if(flip) swap(xstart, xstop)
        if(flip) swap(ystart, ystop)
    
        if(yfirst)
            for y = ystart to ystop
                for x = xstart to xstop
                    hash = hash * 31 + p.contains(x,y) ? 1 : 0
        else
            for x = xstart to xstop
                for y = ystart to ystop
                    hash = hash * 31 + p.contains(x,y) ? 1 : 0
        hashPolySingle = hash
    

    Which is then called in the 8 different ways. You could also encapsulate hashPolySingle in for loop around the corner, and around the flip or not. All the same.

    int hashPoly(Poly p)
        // approach from each of the 4 corners
        int hash = hashPolySingle(p, false, 1)
        hash = hash * 31 + hashPolySingle(p, false, 2)
        hash = hash * 31 + hashPolySingle(p, false, 3)
        hash = hash * 31 + hashPolySingle(p, false, 4)
        // flip it
        hash = hash * 31 + hashPolySingle(p, true, 1)
        hash = hash * 31 + hashPolySingle(p, true, 2)
        hash = hash * 31 + hashPolySingle(p, true, 3)
        hash = hash * 31 + hashPolySingle(p, true, 4)
        hashPoly = hash
    

    In this way, you're implicitly rotating the poly from each direction, but you're not actually performing the rotation and translation. It performs the 8 hashes, which seem to be entirely necessary in order to accurately hash all 8 orientations, but wastes no passes over the poly that are not actually doing hashes. This seems to me to be the most elegant solution.

    Note that there may be a better hashPolySingle() algorithm to use. Mine uses a Cartesian exhaustion algorithm that is on the order of O(n^2). Its worst case scenario is an L shape, which would cause there to be an N/2 * (N-1)/2 sized square for only N elements, or an efficiency of 1:(N-1)/4, compared to an I shape which would be 1:1. It may also be that the inherent invariant imposed by the architecture would actually make it less efficient than the naive algorithm.

    My suspicion is that the above concern can be alleviated by simulating the Cartesian exhaustion by converting the set of nodes into an bi-directional graph that can be traversed, causing the nodes to be hit in the same order as my much more naive hashing algorithm, ignoring the empty spaces. This will bring the algorithm down to O(n) as the graph should be able to be constructed in O(n) time. Because I haven't done this, I can't say for sure, which is why I say it's only a suspicion, but there should be a way to do it.

    0 讨论(0)
  • 2021-02-02 16:36

    Well, I came up with a completely different approach. (Also thanks to corsiKa for some helpful insights!) Rather than hashing / encoding the squares, encode the path around them. The path consists of a sequence of 'turns' (including no turn) to perform before drawing each unit segment. I think an algorithm for getting the path from the coordinates of the squares is outside the scope of this question.

    This does something very important: it destroys all location and orientation information, which we don't need. It is also very easy to get the path of the flipped object: you do so by simply reversing the order of the elements. Storage is compact because each element requires only 2 bits.

    It does introduce one additional constraint: the polyomino must not have fully enclosed holes. (Formally, it must be simply connected.) Most discussions of polyominos consider a hole to exist even if it is sealed only by two touching corners, as this prevents tiling with any other non-trivial polyomino. Tracing the edges is not hindered by touching corners (as in the single heptomino with a hole), but it cannot leap from one outer loop to an inner one as in the complete ring-shaped octomino:

    enter image description here

    It also produces one additional challenge: finding the minumum ordering of the encoded path loop. This is because any rotation of the path (in the sense of string rotation) is a valid encoding. To always get the same encoding we have to find the minimal (or maximal) rotation of the path instructions. Thankfully this problem has already been solved: see for example http://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation.

    Example:

    If we arbitrarily assign the following values to the move operations:

    • No turn: 1
    • Turn right: 2
    • Turn left: 3

    Here is the F pentomino traced clockwise:

    enter image description here

    An arbitrary initial encoding for the F pentomino is (starting at the bottom right corner):

    2,2,3,1,2,2,3,2,2,3,2,1
    

    The resulting minimum rotation of the encoding is

    1,2,2,3,1,2,2,3,2,2,3,2
    

    With 12 elements, this loop can be packed into 24 bits if two bits are used per instruction or only 19 bits if instructions are encoded as powers of three. Even with the 2-bit element encoding can easily fit that in a single unsigned 32 bit integer 0x6B6BAE:

       1- 2- 2- 3- 1- 2- 2- 3- 2- 2- 3- 2
    = 01-10-10-11-01-10-10-11-10-10-11-10
    = 00000000011010110110101110101110
    = 0x006B6BAE
    

    The base-3 encoding with the start of the loop in the most significant powers of 3 is 0x5795F:

        1*3^11 + 2*3^10 + 2*3^9 + 3*3^8 + 1*3^7 + 2*3^6 
      + 2*3^5  + 3*3^4  + 2*3^3 + 2*3^2 + 3*3^1 + 2*3^0
    = 0x0005795F
    

    The maximum number of vertexes in the path around a polyomino of order n is 2n + 2. For 2-bit encoding the number of bits is twice the number of moves, so the maximum bits needed is 4n + 4. For base-3 encoding it's:

    Base 3 Encoded max bits

    Where the "gallows" is the ceiling function. Accordingly any polyomino up to order 9 can be encoded in a single 32 bit integer. Knowing this you can choose your platform-specific data structure accordingly for the fastest hash comparison given the maximum order of the polyominos you'll be hashing.

    0 讨论(0)
  • 2021-02-02 16:41

    A valid hash function, if you're really afraid of hash collisions, is to make a hash function x + order * y for coordinates and then loop trough all the coordinates of a piece, adding (order ^ i) * hash(coord[i]) to the piece hash. That way, you can guarantee you won't get any hash collisions.

    0 讨论(0)
  • 2021-02-02 16:47

    Here's my DFS (depth first search) explained:

    Start with the top-most cell (left-most as a tiebreaker). Mark it as visited. Every time you visit a cell, check all four directions for unvisited neighbors. Always check the four directions in this order: up, left, down, right.

    Example

    enter image description here

    In this example, up and left fail, but down succeeds. So far our output is 001, and we recursively search the "down" cell.

    We mark our new current cell as visited (and we'll finish searching the original cell when we finish searching this cell). Here, up=0, left=1.

    We search the left-most cell and there are no unvisted neighbors (up=0, left=0, down=0, right=0). Our total output so far is 001010000.

    enter image description here

    We continue our search of the second cell. down=0, right=1. We search the cell to the right.

    enter image description here

    up=0, left=0, down=1. Search the down cell: all 0s. Total output so far is 001010000010010000. Then, we return from the down cell...

    enter image description here

    right=0, return. return. (Now, we are at the starting cell.) right=0. Done!

    So, the total output is 20 (N*4) bits: 00101000001001000000.

    Encoding improvement

    But, we can save some bits.

    The last visited cell will always encode 0000 for its four directions. So, don't encode the last visited cell to save 4 bits.

    Another improvement: if you reached a cell by moving left, don't check that cells right-side. So, we only need 3 bits per cell, except 4 bits for the first cell, and 0 for the last cell.

    The first cell will never have an up, or left neighbor, so omit these bits. So the first cell takes 2 bits.

    So, with these improvements, we use only N*3-4 bits (e.g. 5 cells -> 11 bits; 9 cells -> 23 bits).

    If you really want, you can compact a little more by noting that exactly N-1 bits will be "1".

    Caveat

    Yes, you'll need to encode all 8 rotations/flips of the polyomino and choose the least to get a canonical encoding.

    I suspect this will still be faster than the outline approach. Also, holes in the polyomino shouldn't be a problem.

    0 讨论(0)
  • 2021-02-02 16:50

    I worked on the same problem recently. I solved the problem fairly simply by (1) generate a unique ID for a polyomino, such that each identical poly would have the same UID. For example, find the bounding box, normalize the corner of the bounding box, and collect the set of non-empty cells. (2) generate all possible permutations by rotating (and flipping, if appropriate) a polyomino, and look for duplicates.

    The advantage of this brute approach, other than it's simplicity, is that it still works if the polys are distinguishable in some other way, for example if some of them are colored or numbered.

    0 讨论(0)
提交回复
热议问题