Feeds
질문
How do I apply a padding mask(B*T) variable into a training of a self-attention transformer decoder, from a 3D Matrix(B*T*C)
While attempting to train my neural network that uses a self attention layer in its transformer block. I have been struggling to...
10개월 전 | 답변 수: 1 | 0
