Python: Find Amount of Handwriting in Video

前端未结

关注

 4  474

走了就别回头了 2021-02-03 13:38

Do you know of an algorithm that can see that there is handwriting on an image? I am not interested in knowing what the handwriting says, but only that there is

4条回答

庸人自扰 (楼主)

2021-02-03 14:15

You can identify the space taken by hand-writing by masking the pixels from the template, and then do the same for the difference between further frames and the template. You can use dilation, opening, and thresholding for this.

Let's start with your template. Let's identify the parts we will mask:

import cv2
import numpy as np

template = cv2.imread('template.jpg')

Now, let's broaden the occupied pixels to make a zone that we will mask (hide) later:

template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)

Then, we will threshold to turn this into a black and white mask:

_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)

In later frames, we will subtract this mask from the picture, by turning all these pixels to white. For instance:

import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
frames = []

while count < 500:
  frames.append(image)
  success,image = vidcap.read()
  count += 1

mask = np.where(thresh == 0)

example = frames[300]
example[mask] = [255, 255, 255]
cv2.imshow('', example)
cv2.waitKey(0)

Now, we will create a function that will return the difference between the template and a given picture. We will also use opening to get rid of the left over single pixels that would make it ugly.

def difference_with_mask(image):
    grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    kernel = np.ones((5, 5), np.uint8)
    dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
    _, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
    thresh[mask] = 255
    closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    return closing

cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)

To address the fact that you don't want to have the hand detected as hand-writing, I suggest that instead of using the mask for every individual frame, you use the 95th percentile of the 15 last 30th frame... hang on. Look at this:

results = []
for ix, frame in enumerate(frames):
    if ix % 30 == 0:
        history.append(frame)
    results.append(np.quantile(history, 0.95, axis=0))
    print(ix)

Now, the example frame becomes this (the hand is removed because it wasn't mostly present in the 15 last 30th frames):

As you can see a little part of the hand-writing is missing. It will come later, because of the time-dependent percentile transformation we're doing. You'll see later: in my example with frame 18,400, the text that is missing in the image above is present. Then, you can use the function I gave you and this will be the result:

And here we go! Note that this solution, which doesn't include the hand, will take longer to compute because there's a few calculations needing to be done. Using just an image with no regard to the hand would calculate instantly, to the extent that you could probably run it on your webcam feed in real time.

Final Example:

Here's the frame 18,400:

Final image:

You can play with the function if you want the mask to wrap more thinly around the text:

Full code:

import os
import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
from collections import deque
frames = deque(maxlen=700)

while count < 500:
  frames.append(image)
  success,image = vidcap.read()
  count += 1

template = cv2.imread('template.jpg')
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)

cv2.imwrite('dilation.jpg', dilation)
cv2.imshow('', dilation)
cv2.waitKey(0)

_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)
cv2.imwrite('thresh.jpg', thresh)
cv2.imshow('', thresh)
cv2.waitKey(0)

mask = np.where(thresh == 0)

example = frames[400]
cv2.imwrite('original.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)

example[mask] = 255
cv2.imwrite('example_masked.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)

def difference_with_mask(image):
    grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    kernel = np.ones((5, 5), np.uint8)
    dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
    _, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
    thresh[mask] = 255
    closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    return closing


cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)

masked_example = difference_with_mask(frames[400])
cv2.imwrite('masked_example.jpg', masked_example)

from collections import deque
history = deque(maxlen=15)

results = []
for ix, frame in enumerate(frames):
    if ix % 30 == 0:
        history.append(frame)
    results.append(np.quantile(history, 0.95, axis=0))
    print(ix)
    if ix > 500:
        break


cv2.imshow('', frames[400])
cv2.waitKey(0)

cv2.imshow('', results[400].astype(np.uint8))
cv2.imwrite('percentiled_frame.jpg', results[400].astype(np.uint8))
cv2.waitKey(0)

cv2.imshow('', difference_with_mask(results[400].astype(np.uint8)))
cv2.imwrite('final.jpg', difference_with_mask(results[400].astype(np.uint8)))
cv2.waitKey(0)

0 讨论(0)

查看其它4个回答