I am looking for an algorithm which can split an image into smaller images, with some constraints. One constraint is to use the least amount of \"whitespace\" meaning empty
I'd go with the same algorithm as ravloony, but with a slight and important modification, using a "crop" operation that looks for the minimal/maximal columns and rows that aren't completely empty and discarding the rest.
In practice, the crop operation would get a X*Y
region as input and would output 4 integers - the coordinates of the smallest rectangle that contains all the used pixels of the region. This can also be used to detect and discard empty regions.
....................
.xxxxxxxxxxx........ xxxxxxxxxxx.......
...xxxx...xxxxxx.... ..xxxx...xxxxxx...
.............xxxxx.. ............xxxxx.
...............xxx.. => ..............xxx. (first crop)
...............xxx.. ..............xxx.
.................... ..................
..xxxxxx............ .xxxxxx...........
.....xxxxxxxxxxx.... ....xxxxxxxxxxx...
.........xxxxxxxxxx. ........xxxxxxxxxx
....................
Now divide the image into NxN parts (using N=4 here) and use the crop operation on each of the parts:
xxxxx|xxxxx|x....|
..xxx|x...x|xxxxx|
---------------------
| | xxx|xx
| | ..x|xx
---------------------
| | x|xx
| | |
---------------------
xxxx|xx...| |
...x|xxxxx|xxxxx|
|...xx|xxxxx|xxx
For this example, we get 10+10+10+6+4+1+2+8+15+10+3=79 pixels instead of 21*11=231 which is only 34,2%. Note that this happens to be the same amount as with your handcrafted 4-part segmentation (30+15+14+20=79)!
Of course there will be some additional data to keep track of the position and size of the 16 parts for each and it won't always give best results, but I think it's a nice compromise between speed and savings and the algorithm is easy to write and maintain.
About the additional data: Images of size 1024x1024 and splitting into 4x4 parts would give you the possibility to use 4 byte values to store each rectangle, so additional data size would be only 16*4 = 64 bytes - regarding this, you should perhaps consider to increase your 16 part maximum unless it will slow down some other part like the drawing heavily.
Worst cases for this algorithm would be parts with some pixels at or near the edges set, like these:
x......x xxxxxxxx xx......
........ ........ x.......
........ ........ ........
x......x ...x.... .......x
Several solutions for these come to my mind: