In a Transformer, attention score for Q = (1, 1) , K = (3, 4) is Q * K / \sqrt 2