I am trying to train my model on a GPU instead of a CPU on an AWS p2.xlarge instance from my Jupyter Notebook. I am using tensorflow-gpu backend (only tensorflow-gpu
The bottleneck in your case is transferring data to and from the GPU. The best way to speed up your computation (and maximize your GPU usage) is to load as much of your data as your memory can hold at once. Since you have plenty of memory, you can put all your data at once, by doing:
model.fit(X_np, y_np, epochs=100, validation_split=0.25, batch_size=X_np.shape[0])
(You should also probably increase the number of epochs when doing this).
Note however that there are advantages to minibatching (e.g. better handling of local minima), so you should probably consider choosing a batch_size somewhere in between.