How to remove whitespace from an image in Python?

问题

I have a set of images that are output from a code, and I want to be able to remove all of the excess whitespace within the image so that it reduces the image to only the text within the image. This is the pertinent code:

from PIL import Image, ImageFont, ImageDraw, ImageChops
from docx import Document
import textwrap
import re

doc = Document('Patents.docx')
docText = ''.join(paragraph.text for paragraph in doc.paragraphs)


def trim(im, color):
    bg = Image.new(im.mode, im.size, color)
    diff = ImageChops.difference(im, bg)
    diff = ImageChops.add(diff, diff)
    bbox = diff.getbbox()
    if bbox:
        return im.crop(bbox)


for match in find_matches(text=docText, keywords=("responsive", "detecting", "providing")):
    W, H = 300, 300
    body = Image.new('RGB', (W, H), (255, 255, 255))
    border = Image.new('RGB', (W + 4, H + 4), (0, 0, 0))
    border.save('border.png')
    body.save('body.png')
    patent = Image.open('border.png')
    patent.paste(body, (2, 2))
    draw = ImageDraw.Draw(patent)
    font = ImageFont.load_default()

    current_h, pad = 100, 20

    for key in textwrap.wrap(match, width=45):
        line = key.encode('utf-8')
        # (width, height) = font.getsize(line)
        # patent.resize((width, height), resample=0, box=None)
        w, h = draw.textsize(line, font=font)
        draw.text(((W - w) / 2, current_h), line, (0, 0, 0), font=font)
        current_h += h + pad
    for count, matches in enumerate(match):
        patent.save(f'{match}.png')
        patentCrop = trim(patent, 255)
        patentCrop.save(f'{match}_new.png')

Here are the 2 of the 4 outputs from the code that I've constructed (each box is its own output):

I would like to keep the border, but obviously I can always not use the border and then crop the image and then add the border, but at any rate, I need help removing the whitespace. As shown in my code, I'm using a trim function, however it doesn't seem to be working for whatever reason. If there's any solution, be it a fix to my function or an entirely different method, I'd really appreciate the help. The following is what I'm trying to accomplish, of course, each box being its own output:

回答1:

I think this is what you want - a kind of double-matted surround:

#!/usr/bin/env python3

from PIL import Image, ImageDraw, ImageOps

# Open input image
im = Image.open('zHZB9.png')

# Get rid of existing black border by flood-filling with white from top-left corner
ImageDraw.floodfill(im,xy=(0,0),value=(255,255,255),thresh=10)

# Get bounding box of text and trim to it
bbox = ImageOps.invert(im).getbbox()
trimmed = im.crop(bbox)

# Add new white border, then new black, then new white border
res = ImageOps.expand(trimmed, border=10, fill=(255,255,255))
res = ImageOps.expand(res, border=5, fill=(0,0,0))
res = ImageOps.expand(res, border=5, fill=(255,255,255))
res.save('result.png')

来源：https://stackoverflow.com/questions/65664494/how-to-remove-whitespace-from-an-image-in-python

标签

python

python-imaging-library

image-resizing