How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

Question

Sai 2025년 3월 29일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2175797-how-do-i-apply-a-padding-mask-b-t-variable-into-a-training-of-a-self-attention-transformer-decoder

댓글: Sai 2025년 4월 6일

While attempting to train my neural network that uses a self attention layer in its transformer block. I have been struggling to implement a padding mask into my neural network.

-At first all full zero vectors( channels with complete zeroes) have been created into a logical (B*T*1) matrix, and I used the array datastore function import mutiple variable into trainnet(), however while training this error appears

Error using trainnet (line 54)

Error during read from datastore.

Caused by:

Error using horzcat

Dimensions of arrays being concatenated are not consistent.

My train set is a (B*T*C) with 88 channels,as wellas the target set, while the padding mask is a (B*T*1). Would I have to expand the padding in some way to make it conistent to 88 channels or is there another method to incorporating a padding mask.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Aravind 2025년 4월 1일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2175797-how-do-i-apply-a-padding-mask-b-t-variable-into-a-training-of-a-self-attention-transformer-decoder#answer_1562884

Hi @Sai,

From your question, it seems you have created a transformer network and are attempting to apply a padding mask to the self-attention layer, but are encountering difficulties.

I assume you have used the "selfAttentionLayer" function to create the self-attention layer in the transformer. According to the documentation at https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html#mw_3dcae9f1-aa13-493a-a625-f9900b63288b, to provide a padding input, you must set the "HasPaddingMaskInput" property of the layer to "true." Doing this exposes an additional port named "mask" to which the padding mask should be supplied. While the padding mask can have multiple channels, the software only considers the first channel to indicate the padding values. The padding mask must match the batch (B) and time (T) dimensions of the input.

To address your issue, first set the "HasPaddingMaskInput" parameter of the self-attention layer to "true." Then, introduce an additional input in the transformer network architecture to connect directly to the "mask" port of the self-attention layer. This setup enables the neural network (transformer) to accept two inputs: the sequence input and the padding mask. You can use the "Deep Network Designer App" to modify the neural network architecture easily through a GUI. More information about this app is available at: https://www.mathworks.com/help/deeplearning/ref/deepnetworkdesigner-app.html.

Once configured this way, you can use the array datastore to pass both the sequence input and the mask to the transformer, thus preventing the error.

I hope this resolves your issue. If you can provide more details about your specific use case, I can offer more targeted advice.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Sai 2025년 4월 6일

Thank you so much, it pretty much fixed the error.

댓글을 달려면 로그인하십시오.

How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How do I apply a padding mask(B*T) variable into a training of a self-attention transformer decoder, from a 3D Matrix(B*T*C)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기