I am working on dynamic gesture recognition. I have two types of inputs: cropped hands images (say, Input 1) and a set of motion images (Input 2).
Input 1: Very small-siz