largest possible rectangle of letters

后端 未结 3 997
故里飘歌
故里飘歌 2021-02-01 10:04

Write a program to find the largest possible rectangle of letters such that every row forms a word (left to right) and every column forms a word (top to bottom).

3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-01 10:30

    Let D be the dictionary. Fix m and n. We can formulate the problem of finding an m × n rectangle as a constraint satisfaction problem (CSP).

    xi,1…xi,n ∈ D    ∀i ∈ {1, …, m}
    x1,j…xm,j ∈ D    ∀j ∈ {1, …, n}
    xi,j ∈ {A, …, Z}    ∀i ∈ {1, …, m}, ∀j ∈ {1, …, n}

    A very common approach for solving CSPs is to combine backtracking with constraint propagation. Loosely speaking, backtracking means we pick a variable, guess its value, and recursively solve the subproblem with one fewer variable, and constraint propagation means trying to reduce the number of possibilities for each variable (possibly to zero, which means there's no solution).

    As an example, we might start a 3 × 3 grid by choosing x1,1 = Q.

    Q??
    ???
    ???
    

    With an English dictionary, the only possibility for x1,2 and x2,1 is U (in before Scrabble “words”).

    The art of solving CSPs is balancing between backtracking and constraint propagation. If we don't propagate constraints at all, then we're just using brute force. If we propagate constraints perfectly, then we don't need to backtrack, but a propagation algorithm that solves an NP-hard problem by itself is probably rather expensive to run.

    In this problem, working with a large dictionary at each backtracking node will get expensive unless we have good data structure support. I'll outline an approach that uses a trie or a DAWG quickly to compute the set of letters via which a prefix extends to a complete word.

    At each backtracking node, the set of variables we have assigned is a Young tableau. In other words, no variable is assigned until the variables above it and to the left have been assigned. In the diagram below, . denotes an assigned variable and * and ? denote unassigned variables.

    .....*
    ...*??
    ...*??
    ..*???
    *?????
    

    The variables marked * are candidates for the next to be assigned a value. The advantage of having multiple choices rather choosing a fixed variable each time is that some variable orderings can be much better than others.

    For each *, make two lookups into the trie/DAWG, one for the horizontal and one for the vertical, and compute the intersection of the sets of letters that can come next. One classic strategy is to choose the variable with the fewest possibilities in the hope that we reach a contradiction faster. I like this strategy because it prunes naturally when there's a variable with zero possibilities and propagates naturally when there's a variable with one. By caching results, we can make evaluating each node very fast.

提交回复
热议问题