I am working on hand localization in an RGB image. I have two types of datasets. The first one has 1700 images. It has single-hand per image. The second dataset has about 28