I am trying to train a lip-reading CNN on facial images. Each image is 64 by 64 pixels which are flattened to a 1D array of size 4096 in a Pandas data frame. I have encoded the