I need to figure out an algorithm that will calculate the optimized size of the column widths given the following:
I encountered a problem similar to this using ITextPDF with a 14 column table. The data was variable, where some columns could wrap and others couldn't.
My solution was to find the largest word in each column by using split(" "). This reduces the odds of a word, date or number getting cut in half. Here is the code. Sorry I don't have time to edit this to a more general format, hopefully it will help someone anyways.
//This array will store the largest word found in each of the 14 columns
int[] maxStringLengthPerColumn = new int[14];
for(int i = 0; i < maxStringLengthPerColumn.length; i++)
maxStringLengthPerColumn[i]=0;
//for each row in table...
ArrayList<PdfPRow> rows = table.getRows();
for(int a = 0; a < rows.size(); a++){
//for each cell in row
PdfPCell[] cellsInRow = rows.get(a).getCells();
for(int b = 0; b < cellsInRow.length; b++){
//Split cell contents at " " and find longest word in each cell
String[] splitCell = cellsInRow[b].getPhrase().getContent().split(" ");
//find the longest string left after split
int largestStringSize = 0;
for(int c = 0; c < splitCell.length; c++){
if(splitCell[c].length()>largestStringSize){
largestStringSize=splitCell[c].length();
}
}
if(largestStringSize>maxStringLengthPerColumn[b]){
//I found that adding 4 to the value worked, change this number to fine tune.
maxStringLengthPerColumn[b] = largestStringSize + 4;
}
}
}
/*The pdf library can set width with just an array, you may need to
convert these values to something else depending on the application. For
example if you have a width of 800 pixels, the width of col1 would be
maxStringLengthPerColumn[0] / (sum of maxString0 - 13) * 800*/
table.setWidths(maxStringLengthPerColumn);
An easy solution is to assign attributes to your colums; for example your Notes
columns could be flexible
. Then you could calculate the maximum width of each column over all rows, set that width for all non-flexible columns and then distribute the remaining space evenly (or possibly weighted by their max width) to the flexible columns.
But you could also try to find out the attributes with some simple conditions:
Then go about as described above: Calculate all non-flexible columns width. Check if there is enough space; if not, make the wrappable columns flexible, too. Then calculate the width of the flexible cells, weighted by their maximum widths.
A possible pseudocode algorithm is below. It makes liberal use of various heuristics, so you should probably take it with a grain of salt. You can adjust these conditions according to your use case, but it will be difficult to cater for all possible cases.
function layout(table[], width, gutter, col[])
var maxw[col.length] # max. text width over all rows
var maxl[col.length] # max. width of longest word
var flex[col.length] # is column flexible?
var wrap[col.length] # can column be wrapped?
var colw[col.length] # final width of columns
foreach row in table:
for i = 0 to col.length:
cell = row[i]
maxw[i] = max(maxw[i], textwidth(cell))
if cell.find(" "):
maxl[i] = max(maxl[i], wordwidth(cell))
var left = width - (col.length - 1) * gutter
var avg = left / col.length
var nflex = 0
# determine whether columns should be flexible and assign
# width of non-flexible cells
for i = 0 to col.length:
flex[i] = (maxw[i] > 2 * avg) # ???
if flex[i]:
nflex++
else:
colw[i] = maxw[i]
left -= colw[i]
# if there is not enough space, make columns that could
# be word-wrapped flexible, too
if left < nflex * avg:
for i = 0 to col.length:
if !flex[i] and wrap[i]:
left += width[i]
colw[i] = 0
flex[i] = true
nflex += 1
# Calculate weights for flexible columns. The max width
# is capped at the page width to treat columns that have to
# be wrapped more or less equal
var tot = 0
for i = 0 to col.length:
if flex[i]:
maxw[i] = min(maxw[i], width) # ???
tot += maxw[i]
# Now assign the actual width for flexible columns. Make
# sure that it is at least as long as the longest word length
for i = 0 to col.length:
if flex[i]:
colw[i] = left * maxw[i] / tot
colw[i] = max(colw[i], maxl[i])
left -= colw[i]
return colw
The W3C publishes algorithms for stuff like this in it's CSS 3 Tables Algorithms.
A simpler algorithm that I have used successfully and is quite trivial to implement can be found in the HTML4.1 specs:
The minimum and maximum cell widths are then used to determine the corresponding minimum and maximum widths for the columns. These in turn, are used to find the minimum and maximum width for the table. Note that cells can contain nested tables, but this doesn't complicate the code significantly. The next step is to assign column widths according to the available space (i.e., the space between the current left and right margins).
For cells that span multiple columns, a simple approach consists of apportioning the min/max widths evenly to each of the constituent columns. A slightly more complex approach is to use the min/max widths of unspanned cells to weight how spanned widths are apportioned. Experiments suggest that a blend of the two approaches gives good results for a wide range of tables.
The table borders and intercell margins need to be included in assigning column widths. There are three cases:
- The minimum table width is equal to or wider than the available space. In this case, assign the minimum widths and allow the user to scroll horizontally. For conversion to braille, it will be necessary to replace the cells by references to notes containing their full content. By convention these appear before the table.
- The maximum table width fits within the available space. In this case, set the columns to their maximum widths.
- The maximum width of the table is greater than the available space, but the minimum table width is smaller. In this case, find the difference between the available space and the minimum table width, lets call it W. Lets also call D the difference between maximum and minimum width of the table.
For each column, let d be the difference between maximum and minimum width of that column. Now set the column's width to the minimum width plus d times W over D. This makes columns with large differences between minimum and maximum widths wider than columns with smaller differences.