from attention.SelfAttention import ScaledDotProductAttention
import torch

input = torch.randn(50, 49, 512)
sa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8)
output = sa(input, input, input)
print(output.shape)


"""
Attention Is All You Need---NeurIPS2017

论文地址：https://arxiv.org/abs/1706.03762

这是Google在NeurIPS2017发表的一篇文章，在CV、NLP、多模态等各个领域都有很大的影响力，
目前引用量已经2.2w+。Transformer中提出的Self-Attention是Attention的一种，
用于计算特征中不同位置之间的权重，从而达到更新特征的效果。首先将input feature通过FC映射成Q、K、V三个特征，
然后将Q和K进行点乘的得到attention map，再将attention map与V做点乘得到加权后的特征。
最后通过FC进行特征的映射，得到一个新的特征。（关于Transformer和Self-Attention目前网上有许多非常好的讲解，这里就不做详细的介绍了）
"""