Recently I am studying the Transformer mechanism, which contains a mechanism termed Multi-Head Self-Attention Mechanism. After browsing many blogs about how it works in detail,