I\'m looking at the timm implementation of visual transformers and for the positional embedding, he is initializing his position embedding with zeros
self.pos_emb