Algorithm to divide text into 3 evenly-sized groups

后端 未结 4 1474
小蘑菇
小蘑菇 2021-01-15 04:07

I\'m would like to create an algorithm that will divide text into 3-evenly sized groups (based on text length). Since this will be put to use for line-breaks, the order of

4条回答
  •  执念已碎
    2021-01-15 04:24

    The "minimum raggedness" dynamic program, also from the Wikipedia article on word wrap, can be adapted to your needs. Set LineWidth = len(text)/n - 1 and ignore the comment about infinite penalties for exceeding the line width; use the definition of c(i, j) as is with P = 2.


    Code. I took the liberty of modifying the DP always to return exactly n lines, at the cost of increasing the running time from O(#words ** 2) to O(#words ** 2 * n).
    
    def minragged(text, n=3):
        """
        >>> minragged('Just testing to see how this works.')
        ['Just testing', 'to see how', 'this works.']
        >>> minragged('Just testing to see how this works.', 10)
        ['', '', 'Just', 'testing', 'to', 'see', 'how', 'this', 'works.', '']
        """
        words = text.split()
        cumwordwidth = [0]
        # cumwordwidth[-1] is the last element
        for word in words:
            cumwordwidth.append(cumwordwidth[-1] + len(word))
        totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
        linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks
        def cost(i, j):
            """
            cost of a line words[i], ..., words[j - 1] (words[i:j])
            """
            actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
            return (linewidth - float(actuallinewidth)) ** 2
        # best[l][k][0] is the min total cost for words 0, ..., k - 1 on l lines
        # best[l][k][1] is a minimizing index for the start of the last line
        best = [[(0.0, None)] + [(float('inf'), None)] * len(words)]
        # xrange(upper) is the interval 0, 1, ..., upper - 1
        for l in xrange(1, n + 1):
            best.append([])
            for j in xrange(len(words) + 1):
                best[l].append(min((best[l - 1][k][0] + cost(k, j), k) for k in xrange(j + 1)))
        lines = []
        b = len(words)
        # xrange(upper, 0, -1) is the interval upper, upper - 1, ..., 1
        for l in xrange(n, 0, -1):
            a = best[l][b][1]
            lines.append(' '.join(words[a:b]))
            b = a
        lines.reverse()
        return lines
    
    if __name__ == '__main__':
        import doctest
        doctest.testmod()
    

提交回复
热议问题