So I am currently building a model which does a certain type of action recognition, which I am implementing as a two-stage, end-to-end system. The first stage is a pose esti