I am trying to wrap my head around how the Transformer architecture works. I think I have a decent top-level understanding of the encoder part, sort of how the Key, Query, and V