Word wrap to X lines instead of maximum width (Least raggedness)

a 夏天 提交于 2020-01-09 09:55:07

问题


Does anyone know a good algorithm to word wrap an input string to a specified number of lines rather than a set width. Basically to achieve the minimum width for X lines.

e.g. "I would like to be wrapped into two lines"
goes to
"I would like to be
wrapped into two lines"

"I would like to be wrapped into three lines"
goes to
"I would like to
be wrapped into
three lines"

Inserting new lines as required. I can find other word wrap questions but they all have a known width and want to insert as many lines as needed to fit that width. I am after the opposite.

Answers preferable in a .NET language but any language would be helpful. Obviously if there is a framework way to do this I am not aware of let me know.

Edit I have found this since which I think the accepted answer is the solution to my problem but am having difficulty understanding it. Algorithm to divide text into 3 evenly-sized groups any chance someone could convert it to c# or vb.net.


回答1:


Here is the accepted solution from Algorithm to divide text into 3 evenly-sized groups converted to C#:

static List<string> Minragged(string text, int n = 3)
{
    var words = text.Split();

    var cumwordwidth = new List<int>();
    cumwordwidth.Add(0);

    foreach (var word in words)
        cumwordwidth.Add(cumwordwidth[cumwordwidth.Count - 1] + word.Length);

    var totalwidth = cumwordwidth[cumwordwidth.Count - 1] + words.Length - 1;

    var linewidth = (double)(totalwidth - (n - 1)) / n;

    var cost = new Func<int, int, double>((i, j) =>
    {
        var actuallinewidth = Math.Max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
        return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
    });

    var best = new List<List<Tuple<double, int>>>();

    var tmp = new List<Tuple<double, int>>();
    best.Add(tmp);
    tmp.Add(new Tuple<double, int>(0.0f, -1));
    foreach (var word in words)
        tmp.Add(new Tuple<double, int>(double.MaxValue, -1));

    for (int l = 1; l < n + 1; ++l)
    {
        tmp = new List<Tuple<double, int>>();
        best.Add(tmp);
        for (int j = 0; j < words.Length + 1; ++j)
        {
            var min = new Tuple<double, int>(best[l - 1][0].Item1 + cost(0, j), 0);
            for (int k = 0; k < j + 1; ++k)
            {
                var loc = best[l - 1][k].Item1 + cost(k, j);
                if (loc < min.Item1 || (loc == min.Item1 && k < min.Item2))
                    min = new Tuple<double, int>(loc, k);
            }
            tmp.Add(min);
        }
    }

    var lines = new List<string>();
    var b = words.Length;

    for (int l = n; l > 0; --l)
    {
        var a = best[l][b].Item2;
        lines.Add(string.Join(" ", words, a, b - a));
        b = a;
    }

    lines.Reverse();
    return lines;
}



回答2:


A way of solvng this problem would be using dynamic programming, You can solve this problem using dynamic programming, cf Minimum raggedness algorithm. I used some of the informations you add when you eddited your post with : Algorithm to divide text into 3 evenly-sized groups


Notations:

Let name your text document="word1 word2 .... wordp"

n= number of line required

LineWidth=len(document)/n


Cost function:

First you need to define a cost function of having word[i] to word[j] in the same line , you can take the same as the one as the one on wikipedia, with p=2 for example:

It represent the distance between the objective length of a line and the actual lenght.

The total cost function for the optimal solution can be defined with the following recursiion relation:


Solving the problem:

You can solve this problem using dynamic programming. I took the code from the link you gave, and changed it a so you see what the program is using.

  1. At stage k you add words to line k.
  2. Then you look at the optimal cost of having word i to j at line k.
  3. Once you've gone from line 1 to n, you tacke the smallest cost in the last step and you have your optimal result:

Here is the result from the code:

D=minragged('Just testing to see how this works.')

number of words: 7
------------------------------------
stage : 0
------------------------------------
word i to j in line 0       TotalCost (f(j))
------------------------------------
i= 0 j= 0           121.0
i= 0 j= 1           49.0
i= 0 j= 2           1.0
i= 0 j= 3           16.0
i= 0 j= 4           64.0
i= 0 j= 5           144.0
i= 0 j= 6           289.0
i= 0 j= 7           576.0
------------------------------------
stage : 1
------------------------------------
word i to j in line 1       TotalCost (f(j))
------------------------------------
i= 0 j= 0           242.0
i= 0 j= 1           170.0
i= 0 j= 2           122.0
i= 0 j= 3           137.0
i= 0 j= 4           185.0
i= 0 j= 5           265.0
i= 0 j= 6           410.0
i= 0 j= 7           697.0
i= 1 j= 2           65.0
i= 1 j= 3           50.0
i= 1 j= 4           58.0
i= 1 j= 5           98.0
i= 1 j= 6           193.0
i= 1 j= 7           410.0
i= 2 j= 4           26.0
i= 2 j= 5           2.0
i= 2 j= 6           17.0
i= 2 j= 7           122.0
i= 3 j= 7           80.0
------------------------------------
stage : 2
------------------------------------
word i to j in line 2       TotalCost (f(j))
------------------------------------
i= 0 j= 7           818.0
i= 1 j= 7           531.0
i= 2 j= 7           186.0
i= 3 j= 7           114.0
i= 4 j= 7           42.0
i= 5 j= 7           2.0
reversing list
------------------------------------
Just testing        12
to see how      10
this works.         11
  • *There fore the best choice is to have words 5 to 7 in last line.(cf stage2)
  • then words 2 to 5 in second line (cf stage1)
  • then words 0 to 2 in first line (cf stage 0).*

Reverse this and you get:

Just testing          12
to see how          10
this works.          11

Here is the code to print the reasonning,(in python sorry I don't use C#...but I someone actually translated the code in C#) :

def minragged(text, n=3):


    P=2
    words = text.split()
    cumwordwidth = [0]
    # cumwordwidth[-1] is the last element
    for word in words:
        cumwordwidth.append(cumwordwidth[-1] + len(word))
    totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
    linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks

    print "number of words:", len(words)
    def cost(i, j):
        """
        cost of a line words[i], ..., words[j - 1] (words[i:j])
        """
        actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
        return (linewidth - float(actuallinewidth)) ** P

    """
    printing the reasoning and reversing the return list
    """
    F={} # Total cost function

    for stage in range(n):
        print "------------------------------------"
        print "stage :",stage
        print "------------------------------------"
        print "word i to j in line",stage,"\t\tTotalCost (f(j))"
        print "------------------------------------"


        if stage==0:
            F[stage]=[]
            i=0
            for j in range(i,len(words)+1):
                print "i=",i,"j=",j,"\t\t\t",cost(i,j)
                F[stage].append([cost(i,j),0])
        elif stage==(n-1):
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                    j=len(words)
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]            
        else:
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                for j in range(i,len(words)+1):
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]

    print 'reversing list'
    print "------------------------------------"
    listWords=[]
    a=len(words)
    for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
        listWords.append(' '.join(words[F[k][a][1]:a]))
        a=F[k][a][1]
    listWords.append(' '.join(words[0:a]))
    listWords.reverse()

    for line in listWords:
        print line, '\t\t',len(line)

    return listWords



回答3:


There was a discussion about this exact problem (though it was phrased in a different way) at http://www.perlmonks.org/?node_id=180276.

In the end the best solution was to do a binary search through all possible widths to find the smallest width that wound up with no more than the desired number of columns. If there are n items and the average width is m, then you'll need O(log(n) + log(m)) passes to find the right width, each of which takes O(n) time, for O(n * (log(n) + log(m))). This is probably fast enough with no more need to be clever.

If you wish to be clever, you can create an array of word counts, and cumulative lengths of the words. Then use binary searches on this data structure to figure out where the line breaks are. Creating this data structure is O(n), and it makes all of the passes to figure out the right width be O(log(n) * (log(n) + log(m))) which for reasonable lengths of words is dominated by your first O(n) pass.

If the widths of words can be floating point, you'll need to do something more clever with the binary searches, but you are unlikely to need that particular optimization.




回答4:


btilly has the right answer here, but just for fun I decided to code up a solution in python:

def wrap_min_width(words, n):
    r, l = [], ""
    for w in words:
        if len(w) + len(l) > n:
            r, l = r + [l], ""
        l += (" " if len(l) > 0 else "") + w
    return r + [l]  

def min_lines(phrase, lines):
    words = phrase.split(" ")
    hi, lo = sum([ len(w) for w in words ]), min([len(w) for w in words])
    while lo < hi:
        mid = lo + (hi-lo)/2
        v = wrap_min_width(words, mid)
        if len(v) > lines:
            lo = mid + 1
        elif len(v) <= lines:
            hi = mid
    return lo, "\n".join(wrap_min_width(words, lo))

Now this still may not be exactly what you want, since if it is possible to wrap the words in fewer than n lines using the same line width, it instead returns the smallest number of lines encoding. (Of course you can always add extra empty lines, but it is a bit silly.) If I run it on your test case, here is what I get:

Case: "I would like to be wrapped into three lines", 3 lines

Result: 14 chars/line

I would like to

be wrapped into

three lines




回答5:


I just thought of an approach:
You can write a function accepting two parameters 1. String 2. Number of lines

Get the length of the string (String.length if using C#). Divide the length by number of lines (lets say the result is n)

Now start a loop and access each character of the string (using string[i]) Insert a '\n\r' after every nth occurrence in the array of characters.

In the loop maintain a temp string array which would be null if there is a blank character(maintaining each word).
If there is a nth occurrence and temp string is not null then insert '\n\r' after that temp string.




回答6:


I'll assume you're trying to minimize the maximum width of a string with n breaks. This can be done in O(words(str)*n) time and space using dynamic programming or recursion with memoziation.

The recurrence would look like this where the word has been split in to words

def wordwrap(remaining_words, n):
    if n > 0 and len(remaining_words)==0:
        return INFINITY  #we havent chopped enough lines

    if n == 0:
        return len(remaining_words.join(' ')) # rest of the string

    best = INFINITY
    for i in range remaining_words:
        # split here 
        best = min( max(wordwrap( remaining_words[i+1:], n-1),remaining_words[:i].join(' ')), best  )  

    return best



回答7:


I converted the C# accepted answer to JavaScript for something I was working on. Posting it here might save someone a few minutes of doing it themselves.

function WrapTextWithLimit(text, n) {
    var words = text.toString().split(' ');
    var cumwordwidth = [0];
    words.forEach(function(word) {
        cumwordwidth.push(cumwordwidth[cumwordwidth.length - 1] + word.length);
    });
    var totalwidth = cumwordwidth[cumwordwidth.length - 1] + words.length - 1;
    var linewidth = (totalwidth - (n - 1.0)) / n;
    var cost = function(i, j) {
        var actuallinewidth = Math.max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
        return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
    };
    var best = [];
    var tmp = [];
    best.push(tmp);
    tmp.push([0.0, -1]);
    words.forEach(function(word) {
        tmp.push([Number.MAX_VALUE, -1]);
    });
    for (var l = 1; l < n + 1; ++l)
    {
        tmp = [];
        best.push(tmp);
        for (var j = 0; j < words.length + 1; ++j)
        {
            var min = [best[l - 1][0][0] + cost(0, j), 0];
            for (var k = 0; k < j + 1; ++k)
            {
                var loc = best[l - 1][k][0] + cost(k, j);
                if (loc < min[0] || (loc === min[0] && k < min[1])) {
                    min = [loc, k];
                }
            }
            tmp.push(min);
        }
    }
    var lines = [];
    var b = words.length;
    for (var p = n; p > 0; --p) {
        var a = best[p][b][1];
        lines.push(words.slice(a, b).join(' '));
        b = a;
    }
    lines.reverse();
    return lines;
}



回答8:


This solution improves on Mikola's.

It's better because

  1. It doesn't use strings. You don't need to use strings and concatenate them. You just need an array of their lengths. So, because of this it's faster, also you can use this method with any kind of "element" - you just need the widths.
  2. There was some unnecessary processing in the wrap_min_width function. It just kept going even when it went beyond the point of failure. Also, it just builds the string unnecessarily.
  3. Added the "separator width" as an adjustable parameter.
  4. It calculates the min width - which is really what you want.
  5. Fixed some bugs.

This is written in Javascript:

 // For testing calcMinWidth

var formatString = function (str, nLines) {

    var words = str.split(" ");
    var elWidths = words.map(function (s, i) {
        return s.length;
    });

    var width = calcMinWidth(elWidths, 1, nLines, 0.1);

    var format = function (width)
    {
        var lines = [];
        var curLine = null;
        var curLineLength = 0;

        for (var i = 0; i < words.length; ++i) {
            var word = words[i];
            var elWidth = elWidths[i];

            if (curLineLength + elWidth > width)
            {
                lines.push(curLine.join(" "));
                curLine = [word];
                curLineLength = elWidth;
                continue;
            }

            if (i === 0)
                curLine = [word];
            else
            {
                curLineLength += 1;
                curLine.push(word);
            }

            curLineLength += elWidth;
        }

        if (curLine !== null)
            lines.push(curLine.join(" "));

        return lines.join("\n");
    };

    return format(width);
};

var calcMinWidth = function (elWidths, separatorWidth, lines, tolerance)
{
    var testFit = function (width)
    {
        var nCurLine = 1;
        var curLineLength = 0;

        for (var i = 0; i < elWidths.length; ++i) {
            var elWidth = elWidths[i];

            if (curLineLength + elWidth > width)
            {
                if (elWidth > width)
                    return false;

                if (++nCurLine > lines)
                    return false;

                curLineLength = elWidth;
                continue;
            }

            if (i > 0)
                curLineLength += separatorWidth;

            curLineLength += elWidth;
        }

        return true;
    };


    var hi = 0;
    var lo = null;

    for (var i = 0; i < elWidths.length; ++i) {
        var elWidth = elWidths[i];

        if (i > 0)
            hi += separatorWidth;

        hi += elWidth;

        if (lo === null || elWidth > lo)
            lo = elWidth;
    }

    if (lo === null)
        lo = 0;

    while (hi - lo > tolerance)
    {
        var guess = (hi + lo) / 2;

        if (testFit(guess))
            hi = guess;
        else
            lo = guess;
    }

    return hi;
};


来源:https://stackoverflow.com/questions/6426017/word-wrap-to-x-lines-instead-of-maximum-width-least-raggedness

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!