Does the selfattentionLayer also perform softmax and scaling?
이전 댓글 표시
In https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html, it states that:
A self-attention layer computes single-head or multihead self-attention of its input.
The layer:
- Computes the queries, keys, and values from the input
- Computes the scaled dot-product attention across heads using the queries, keys, and values
- Merges the results from the heads
- Performs a linear transformation on the merged result
I wonder if the layer also apply softmax to the scaling (i.e. divide (Q*K) by sqrt(dim))? My understanding is that, within step 2, this softmax and scaling should happen.
Please clarify that for me or more general users.
Thanks.
채택된 답변
추가 답변 (1개)
xingxingcui
2024년 1월 11일
편집: xingxingcui
2024년 4월 27일
0 개 추천
Hi,@Chih
-------------------------Off-topic interlude, 2024-------------------------------
I am currently looking for a job in the field of CV algorithm development, based in Shenzhen, Guangdong, China,or a remote support position. I would be very grateful if anyone is willing to offer me a job or make a recommendation. My preliminary resume can be found at: https://cuixing158.github.io/about/ . Thank you!
Email: cuixingxing150@gmail.com
카테고리
도움말 센터 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!