简体中文 | English
FightingCV 代码库, 包含 Attention,Backbone, MLP, Re-parameter, Convolution
Hello,大家好,我是小马🚀🚀🚀
For 小白(Like Me): 最近在读论文的时候会发现一个问题,有时候论文核心思想非常简单,核心代码可能也就十几行。但是打开作者release的源码时,却发现提出的模块嵌入到分类、检测、分割等任务框架中,导致代码比较冗余,对于特定任务框架不熟悉的我,很难找到核心代码,导致在论文和网络思想的理解上会有一定困难。
For 进阶者(Like You): 如果把Conv、FC、RNN这些基本单元看做小的Lego积木,把Transformer、ResNet这些结构看成已经搭好的Lego城堡。那么本项目提供的模块就是一个个具有完整语义信息的Lego组件。让科研工作者们避免反复造轮子,只需思考如何利用这些"Lego组件",搭建出更多绚烂多彩的作品。
For 大神(May Be Like You): 能力有限,不喜轻喷!!!
For All: 本项目致力于实现一个既能让深度学习小白也能搞懂,又能服务科研和工业社区的代码库。
直接通过 pip 安装
pip install fightingcv-attention
或克隆该仓库
git clone https://github.com/xmu-xiaoma666/External-Attention-pytorch.git
cd External-Attention-pytorch
import torch from torch import nn from torch.nn import functional as F # 使用 pip 方式 from fightingcv_attention.attention.MobileViTv2Attention import * if __name__ == '__main__': input=torch.randn(50,49,512) sa = MobileViTv2Attention(d_model=512) output=sa(input) print(output.shape)
- pip包 内置模块使用参考: fightingcv-attention 说明文档
import torch from torch import nn from torch.nn import functional as F # 与 pip方式 区别在于 将 `fightingcv_attention` 替换 `model` from model.attention.MobileViTv2Attention import * if __name__ == '__main__': input=torch.randn(50,49,512) sa = MobileViTv2Attention(d_model=512) output=sa(input) print(output.shape)
-
Pytorch implementation of "Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks---arXiv 2021年05月05日"
-
Pytorch implementation of "Attention Is All You Need---NIPS2017"
-
Pytorch implementation of "Squeeze-and-Excitation Networks---CVPR2018"
-
Pytorch implementation of "Selective Kernel Networks---CVPR2019"
-
Pytorch implementation of "CBAM: Convolutional Block Attention Module---ECCV2018"
-
Pytorch implementation of "BAM: Bottleneck Attention Module---BMCV2018"
-
Pytorch implementation of "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks---CVPR2020"
-
Pytorch implementation of "Dual Attention Network for Scene Segmentation---CVPR2019"
-
Pytorch implementation of "EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network---arXiv 2021年05月30日"
-
Pytorch implementation of "ResT: An Efficient Transformer for Visual Recognition---arXiv 2021年05月28日"
-
Pytorch implementation of "SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS---ICASSP 2021"
-
Pytorch implementation of "MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning---arXiv 2019年11月17日"
-
Pytorch implementation of "Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks---arXiv 2019年05月23日"
-
Pytorch implementation of "A2-Nets: Double Attention Networks---NIPS2018"
-
Pytorch implementation of "An Attention Free Transformer---ICLR2021 (Apple New Work)"
-
Pytorch implementation of VOLO: Vision Outlooker for Visual Recognition---arXiv 2021年06月24日" 【论文解析】
-
Pytorch implementation of Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition---arXiv 2021年06月23日 【论文解析】
-
Pytorch implementation of CoAtNet: Marrying Convolution and Attention for All Data Sizes---arXiv 2021年06月09日 【论文解析】
-
Pytorch implementation of Scaling Local Self-Attention for Parameter Efficient Visual Backbones---CVPR2021 Oral 【论文解析】
-
Pytorch implementation of Polarized Self-Attention: Towards High-quality Pixel-wise Regression---arXiv 2021年07月02日 【论文解析】
-
Pytorch implementation of Contextual Transformer Networks for Visual Recognition---arXiv 2021年07月26日 【论文解析】
-
Pytorch implementation of Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021
-
Pytorch implementation of S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021年08月02日 【论文解析】
-
Pytorch implementation of Global Filter Networks for Image Classification---arXiv 2021年07月01日
-
Pytorch implementation of Rotate to Attend: Convolutional Triplet Attention Module---WACV 2021
-
Pytorch implementation of Coordinate Attention for Efficient Mobile Network Design ---CVPR 2021
-
Pytorch implementation of MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021年10月05日
-
Pytorch implementation of Non-deep Networks---ArXiv 2021年10月20日
-
Pytorch implementation of UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021年09月29日
-
Pytorch implementation of Separable Self-attention for Mobile Vision Transformers---ArXiv 2022年06月06日
-
Pytorch implementation of On the Integration of Self-Attention and Convolution---ArXiv 2022年03月14日
-
Pytorch implementation of CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022
-
Pytorch implementation of Aggregating Global Features into Local Vision Transformer
-
Pytorch implementation of CCNet: Criss-Cross Attention for Semantic Segmentation
-
Pytorch implementation of Axial Attention in Multidimensional Transformers
"Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks"
from model.attention.ExternalAttention import ExternalAttention import torch input=torch.randn(50,49,512) ea = ExternalAttention(d_model=512,S=8) output=ea(input) print(output.shape)
from model.attention.SelfAttention import ScaledDotProductAttention import torch input=torch.randn(50,49,512) sa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8) output=sa(input,input,input) print(output.shape)
from model.attention.SimplifiedSelfAttention import SimplifiedScaledDotProductAttention import torch input=torch.randn(50,49,512) ssa = SimplifiedScaledDotProductAttention(d_model=512, h=8) output=ssa(input,input,input) print(output.shape)
"Squeeze-and-Excitation Networks"
from model.attention.SEAttention import SEAttention import torch input=torch.randn(50,512,7,7) se = SEAttention(channel=512,reduction=8) output=se(input) print(output.shape)
from model.attention.SKAttention import SKAttention import torch input=torch.randn(50,512,7,7) se = SKAttention(channel=512,reduction=8) output=se(input) print(output.shape)
"CBAM: Convolutional Block Attention Module"
from model.attention.CBAM import CBAMBlock import torch input=torch.randn(50,512,7,7) kernel_size=input.shape[2] cbam = CBAMBlock(channel=512,reduction=16,kernel_size=kernel_size) output=cbam(input) print(output.shape)
"BAM: Bottleneck Attention Module"
from model.attention.BAM import BAMBlock import torch input=torch.randn(50,512,7,7) bam = BAMBlock(channel=512,reduction=16,dia_val=2) output=bam(input) print(output.shape)
"ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks"
from model.attention.ECAAttention import ECAAttention import torch input=torch.randn(50,512,7,7) eca = ECAAttention(kernel_size=3) output=eca(input) print(output.shape)
"Dual Attention Network for Scene Segmentation"
from model.attention.DANet import DAModule import torch input=torch.randn(50,512,7,7) danet=DAModule(d_model=512,kernel_size=3,H=7,W=7) print(danet(input).shape)
"EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network"
from model.attention.PSA import PSA import torch input=torch.randn(50,512,7,7) psa = PSA(channel=512,reduction=8) output=psa(input) print(output.shape)
"ResT: An Efficient Transformer for Visual Recognition"
from model.attention.EMSA import EMSA import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,64,512) emsa = EMSA(d_model=512, d_k=512, d_v=512, h=8,H=8,W=8,ratio=2,apply_transform=True) output=emsa(input,input,input) print(output.shape)
"SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS"
from model.attention.ShuffleAttention import ShuffleAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) se = ShuffleAttention(channel=512,G=8) output=se(input) print(output.shape)
"MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning"
from model.attention.MUSEAttention import MUSEAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,49,512) sa = MUSEAttention(d_model=512, d_k=512, d_v=512, h=8) output=sa(input,input,input) print(output.shape)
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks
from model.attention.SGE import SpatialGroupEnhance import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) sge = SpatialGroupEnhance(groups=8) output=sge(input) print(output.shape)
A2-Nets: Double Attention Networks
from model.attention.A2Atttention import DoubleAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) a2 = DoubleAttention(512,128,128,True) output=a2(input) print(output.shape)
from model.attention.AFT import AFT_FULL import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,49,512) aft_full = AFT_FULL(d_model=512, n=49) output=aft_full(input) print(output.shape)
VOLO: Vision Outlooker for Visual Recognition"
from model.attention.OutlookAttention import OutlookAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,28,28,512) outlook = OutlookAttention(dim=512) output=outlook(input) print(output.shape)
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition"
from model.attention.ViP import WeightedPermuteMLP import torch from torch import nn from torch.nn import functional as F input=torch.randn(64,8,8,512) seg_dim=8 vip=WeightedPermuteMLP(512,seg_dim) out=vip(input) print(out.shape)
CoAtNet: Marrying Convolution and Attention for All Data Sizes"
None
from model.attention.CoAtNet import CoAtNet import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,3,224,224) mbconv=CoAtNet(in_ch=3,image_size=224) out=mbconv(input) print(out.shape)
Scaling Local Self-Attention for Parameter Efficient Visual Backbones"
from model.attention.HaloAttention import HaloAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,512,8,8) halo = HaloAttention(dim=512, block_size=2, halo_size=1,) output=halo(input) print(output.shape)
Polarized Self-Attention: Towards High-quality Pixel-wise Regression"
from model.attention.PolarizedSelfAttention import ParallelPolarizedSelfAttention,SequentialPolarizedSelfAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,512,7,7) psa = SequentialPolarizedSelfAttention(channel=512) output=psa(input) print(output.shape)
Contextual Transformer Networks for Visual Recognition---arXiv 2021年07月26日
from model.attention.CoTAttention import CoTAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) cot = CoTAttention(dim=512,kernel_size=3) output=cot(input) print(output.shape)
Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021
from model.attention.ResidualAttention import ResidualAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) resatt = ResidualAttention(channel=512,num_class=1000,la=0.2) output=resatt(input) print(output.shape)
S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021年08月02日
from model.attention.S2Attention import S2Attention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) s2att = S2Attention(channels=512) output=s2att(input) print(output.shape)
Global Filter Networks for Image Classification---arXiv 2021年07月01日
25.3. Usage Code - Implemented by Wenliang Zhao (Author)
from model.attention.gfnet import GFNet import torch from torch import nn from torch.nn import functional as F x = torch.randn(1, 3, 224, 224) gfnet = GFNet(embed_dim=384, img_size=224, patch_size=16, num_classes=1000) out = gfnet(x) print(out.shape)
Rotate to Attend: Convolutional Triplet Attention Module---CVPR 2021
26.3. Usage Code - Implemented by digantamisra98
from model.attention.TripletAttention import TripletAttention import torch from torch import nn from torch.nn import functional as F input=torch.randn(50,512,7,7) triplet = TripletAttention() output=triplet(input) print(output.shape)
Coordinate Attention for Efficient Mobile Network Design---CVPR 2021
27.3. Usage Code - Implemented by Andrew-Qibin
from model.attention.CoordAttention import CoordAtt import torch from torch import nn from torch.nn import functional as F inp=torch.rand([2, 96, 56, 56]) inp_dim, oup_dim = 96, 96 reduction=32 coord_attention = CoordAtt(inp_dim, oup_dim, reduction=reduction) output=coord_attention(inp) print(output.shape)
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021年10月05日
from model.attention.MobileViTAttention import MobileViTAttention import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': m=MobileViTAttention() input=torch.randn(1,3,49,49) output=m(input) print(output.shape) #output:(1,3,49,49)
Non-deep Networks---ArXiv 2021年10月20日
from model.attention.ParNetAttention import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(50,512,7,7) pna = ParNetAttention(channel=512) output=pna(input) print(output.shape) #50,512,7,7
UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021年09月29日
from model.attention.UFOAttention import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(50,49,512) ufo = UFOAttention(d_model=512, d_k=512, d_v=512, h=8) output=ufo(input,input,input) print(output.shape) #[50, 49, 512]
On the Integration of Self-Attention and Convolution
from model.attention.ACmix import ACmix import torch if __name__ == '__main__': input=torch.randn(50,256,7,7) acmix = ACmix(in_planes=256, out_planes=256) output=acmix(input) print(output.shape)
Separable Self-attention for Mobile Vision Transformers---ArXiv 2022年06月06日
from model.attention.MobileViTv2Attention import MobileViTv2Attention import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(50,49,512) sa = MobileViTv2Attention(d_model=512) output=sa(input) print(output.shape)
Vision Transformer with Deformable Attention---CVPR2022
from model.attention.DAT import DAT import torch if __name__ == '__main__': input=torch.randn(1,3,224,224) model = DAT( img_size=224, patch_size=4, num_classes=1000, expansion=4, dim_stem=96, dims=[96, 192, 384, 768], depths=[2, 2, 6, 2], stage_spec=[['L', 'S'], ['L', 'S'], ['L', 'D', 'L', 'D', 'L', 'D'], ['L', 'D']], heads=[3, 6, 12, 24], window_sizes=[7, 7, 7, 7] , groups=[-1, -1, 3, 6], use_pes=[False, False, True, True], dwc_pes=[False, False, False, False], strides=[-1, -1, 1, 1], sr_ratios=[-1, -1, -1, -1], offset_range_factor=[-1, -1, 2, 2], no_offs=[False, False, False, False], fixed_pes=[False, False, False, False], use_dwc_mlps=[False, False, False, False], use_conv_patches=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, ) output=model(input) print(output[0].shape)
CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022
from model.attention.Crossformer import CrossFormer import torch if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CrossFormer(img_size=224, patch_size=[4, 8, 16, 32], in_chans= 3, num_classes=1000, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], group_size=[7, 7, 7, 7], mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0.0, drop_path_rate=0.1, ape=False, patch_norm=True, use_checkpoint=False, merge_size=[[2, 4], [2,4], [2, 4]] ) output=model(input) print(output.shape)
Aggregating Global Features into Local Vision Transformer
from model.attention.MOATransformer import MOATransformer import torch if __name__ == '__main__': input=torch.randn(1,3,224,224) model = MOATransformer( img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dim=96, depths=[2, 2, 6], num_heads=[3, 6, 12], window_size=14, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0.0, drop_path_rate=0.1, ape=False, patch_norm=True, use_checkpoint=False ) output=model(input) print(output.shape)
CCNet: Criss-Cross Attention for Semantic Segmentation
from model.attention.CrissCrossAttention import CrissCrossAttention import torch if __name__ == '__main__': input=torch.randn(3, 64, 7, 7) model = CrissCrossAttention(64) outputs = model(input) print(outputs.shape)
Axial Attention in Multidimensional Transformers
from model.attention.Axial_attention import AxialImageTransformer import torch if __name__ == '__main__': input=torch.randn(3, 128, 7, 7) model = AxialImageTransformer( dim = 128, depth = 12, reversible = True ) outputs = model(input) print(outputs.shape)
-
Pytorch implementation of "Deep Residual Learning for Image Recognition---CVPR2016 Best Paper"
-
Pytorch implementation of "Aggregated Residual Transformations for Deep Neural Networks---CVPR2017"
-
Pytorch implementation of MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2020年10月05日
-
Pytorch implementation of Patches Are All You Need?---ICLR2022 (Under Review)
-
Pytorch implementation of Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer---ArXiv 2021年06月07日
-
Pytorch implementation of ConTNet: Why not use convolution and transformer at the same time?---ArXiv 2021年04月27日
-
Pytorch implementation of Vision Transformers with Hierarchical Attention---ArXiv 2022年06月15日
-
Pytorch implementation of Co-Scale Conv-Attentional Image Transformers---ArXiv 2021年08月26日
-
Pytorch implementation of Conditional Positional Encodings for Vision Transformers
-
Pytorch implementation of Rethinking Spatial Dimensions of Vision Transformers---ICCV 2021
-
Pytorch implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification---ICCV 2021
-
Pytorch implementation of Transformer in Transformer---NeurIPS 2021
-
Pytorch implementation of DeepViT: Towards Deeper Vision Transformer
-
Pytorch implementation of Incorporating Convolution Designs into Visual Transformers
-
Pytorch implementation of ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
-
Pytorch implementation of Augmenting Convolutional networks with attention-based aggregation
-
Pytorch implementation of Going deeper with Image Transformers---ICCV 2021 (Oral)
-
Pytorch implementation of Training data-efficient image transformers & distillation through attention---ICML 2021
-
Pytorch implementation of LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
-
Pytorch implementation of VOLO: Vision Outlooker for Visual Recognition
-
Pytorch implementation of Container: Context Aggregation Network---NeuIPS 2021
-
Pytorch implementation of CMT: Convolutional Neural Networks Meet Vision Transformers---CVPR 2022
-
Pytorch implementation of Vision Transformer with Deformable Attention---CVPR 2022
-
Pytorch implementation of EfficientFormer: Vision Transformers at MobileNet Speed
-
Pytorch implementation of ConvNeXtV2: Co-designing and Scaling ConvNets with Masked Autoencoders
"Deep Residual Learning for Image Recognition---CVPR2016 Best Paper"
from model.backbone.resnet import ResNet50,ResNet101,ResNet152 import torch if __name__ == '__main__': input=torch.randn(50,3,224,224) resnet50=ResNet50(1000) # resnet101=ResNet101(1000) # resnet152=ResNet152(1000) out=resnet50(input) print(out.shape)
"Aggregated Residual Transformations for Deep Neural Networks---CVPR2017"
from model.backbone.resnext import ResNeXt50,ResNeXt101,ResNeXt152 import torch if __name__ == '__main__': input=torch.randn(50,3,224,224) resnext50=ResNeXt50(1000) # resnext101=ResNeXt101(1000) # resnext152=ResNeXt152(1000) out=resnext50(input) print(out.shape)
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2020年10月05日
from model.backbone.MobileViT import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(1,3,224,224) ### mobilevit_xxs mvit_xxs=mobilevit_xxs() out=mvit_xxs(input) print(out.shape) ### mobilevit_xs mvit_xs=mobilevit_xs() out=mvit_xs(input) print(out.shape) ### mobilevit_s mvit_s=mobilevit_s() out=mvit_s(input) print(out.shape)
Patches Are All You Need?---ICLR2022 (Under Review)
from model.backbone.ConvMixer import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': x=torch.randn(1,3,224,224) convmixer=ConvMixer(dim=512,depth=12) out=convmixer(x) print(out.shape) #[1, 1000]
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
from model.backbone.ShuffleTransformer import ShuffleTransformer import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(1,3,224,224) sft = ShuffleTransformer() output=sft(input) print(output.shape)
ConTNet: Why not use convolution and transformer at the same time?
from model.backbone.ConTNet import ConTNet import torch from torch import nn from torch.nn import functional as F if __name__ == "__main__": model = build_model(use_avgdown=True, relative=True, qkv_bias=True, pre_norm=True) input = torch.randn(1, 3, 224, 224) out = model(input) print(out.shape)
Vision Transformers with Hierarchical Attention
from model.backbone.HATNet import HATNet import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(1,3,224,224) hat = HATNet(dims=[48, 96, 240, 384], head_dim=48, expansions=[8, 8, 4, 4], grid_sizes=[8, 7, 7, 1], ds_ratios=[8, 4, 2, 1], depths=[2, 2, 6, 3]) output=hat(input) print(output.shape)
Co-Scale Conv-Attentional Image Transformers
from model.backbone.CoaT import CoaT import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CoaT(patch_size=4, embed_dims=[152, 152, 152, 152], serial_depths=[2, 2, 2, 2], parallel_depth=6, num_heads=8, mlp_ratios=[4, 4, 4, 4]) output=model(input) print(output.shape) # torch.Size([1, 1000])
PVT v2: Improved Baselines with Pyramid Vision Transformer
from model.backbone.PVT import PyramidVisionTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = PyramidVisionTransformer( patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1]) output=model(input) print(output.shape)
Conditional Positional Encodings for Vision Transformers
from model.backbone.CPVT import CPVTV2 import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CPVTV2( patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1]) output=model(input) print(output.shape)
Rethinking Spatial Dimensions of Vision Transformers
from model.backbone.PIT import PoolingTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = PoolingTransformer( image_size=224, patch_size=14, stride=7, base_dims=[64, 64, 64], depth=[3, 6, 4], heads=[4, 8, 16], mlp_ratio=4 ) output=model(input) print(output.shape)
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
from model.backbone.CrossViT import VisionTransformer import torch from torch import nn if __name__ == "__main__": input=torch.randn(1,3,224,224) model = VisionTransformer( img_size=[240, 224], patch_size=[12, 16], embed_dim=[192, 384], depth=[[1, 4, 0], [1, 4, 0], [1, 4, 0]], num_heads=[6, 6], mlp_ratio=[4, 4, 1], qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6) ) output=model(input) print(output.shape)
from model.backbone.TnT import TNT import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = TNT( img_size=224, patch_size=16, outer_dim=384, inner_dim=24, depth=12, outer_num_heads=6, inner_num_heads=4, qkv_bias=False, inner_stride=4) output=model(input) print(output.shape)
DeepViT: Towards Deeper Vision Transformer
from model.backbone.DViT import DeepVisionTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = DeepVisionTransformer( patch_size=16, embed_dim=384, depth=[False] * 16, apply_transform=[False] * 0 + [True] * 32, num_heads=12, mlp_ratio=3, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), ) output=model(input) print(output.shape)
Incorporating Convolution Designs into Visual Transformers
from model.backbone.CeiT import CeIT import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CeIT( hybrid_backbone=Image2Tokens(), patch_size=4, embed_dim=192, depth=12, num_heads=3, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6) ) output=model(input) print(output.shape)
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
from model.backbone.ConViT import VisionTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = VisionTransformer( num_heads=16, norm_layer=partial(nn.LayerNorm, eps=1e-6) ) output=model(input) print(output.shape)
Going deeper with Image Transformers
from model.backbone.CaiT import CaiT import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CaiT( img_size= 224, patch_size=16, embed_dim=192, depth=24, num_heads=4, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), init_scale=1e-5, depth_token_only=2 ) output=model(input) print(output.shape)
Augmenting Convolutional networks with attention-based aggregation
from model.backbone.PatchConvnet import PatchConvnet import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = PatchConvnet( patch_size=16, embed_dim=384, depth=60, num_heads=1, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), Patch_layer=ConvStem, Attention_block=Conv_blocks_se, depth_token_only=1, mlp_ratio_clstk=3.0, ) output=model(input) print(output.shape)
Training data-efficient image transformers & distillation through attention
from model.backbone.DeiT import DistilledVisionTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = DistilledVisionTransformer( patch_size=16, embed_dim=384, depth=12, num_heads=6, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6) ) output=model(input) print(output[0].shape)
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
from model.backbone.LeViT import * import torch from torch import nn if __name__ == '__main__': for name in specification: input=torch.randn(1,3,224,224) model = globals()[name](fuse=True, pretrained=False) model.eval() output = model(input) print(output.shape)
VOLO: Vision Outlooker for Visual Recognition
from model.backbone.VOLO import VOLO import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = VOLO([4, 4, 8, 2], embed_dims=[192, 384, 384, 384], num_heads=[6, 12, 12, 12], mlp_ratios=[3, 3, 3, 3], downsamples=[True, False, False, False], outlook_attention=[True, False, False, False ], post_layers=['ca', 'ca'], ) output=model(input) print(output[0].shape)
Container: Context Aggregation Network
from model.backbone.Container import VisionTransformer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = VisionTransformer( img_size=[224, 56, 28, 14], patch_size=[4, 2, 2, 2], embed_dim=[64, 128, 320, 512], depth=[3, 4, 8, 3], num_heads=16, mlp_ratio=[8, 8, 4, 4], qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6)) output=model(input) print(output.shape)
CMT: Convolutional Neural Networks Meet Vision Transformers
from model.backbone.CMT import CMT_Tiny import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = CMT_Tiny() output=model(input) print(output[0].shape)
EfficientFormer: Vision Transformers at MobileNet Speed
from model.backbone.EfficientFormer import EfficientFormer import torch from torch import nn if __name__ == '__main__': input=torch.randn(1,3,224,224) model = EfficientFormer( layers=EfficientFormer_depth['l1'], embed_dims=EfficientFormer_width['l1'], downsamples=[True, True, True, True], vit_num=1, ) output=model(input) print(output[0].shape)
ConvNeXtV2: Co-designing and Scaling ConvNets with Masked Autoencoders
from model.backbone.convnextv2 import convnextv2_atto import torch from torch import nn if __name__ == "__main__": model = convnextv2_atto() input = torch.randn(1, 3, 224, 224) out = model(input) print(out.shape)
-
Pytorch implementation of "RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition---arXiv 2021年05月05日"
-
Pytorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision---arXiv 2021年05月17日"
-
Pytorch implementation of "ResMLP: Feedforward networks for image classification with data-efficient training---arXiv 2021年05月07日"
-
Pytorch implementation of "Pay Attention to MLPs---arXiv 2021年05月17日"
-
Pytorch implementation of "Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?---arXiv 2021年09月12日"
"RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition"
from model.mlp.repmlp import RepMLP import torch from torch import nn N=4 #batch size C=512 #input dim O=1024 #output dim H=14 #image height W=14 #image width h=7 #patch height w=7 #patch width fc1_fc2_reduction=1 #reduction ratio fc3_groups=8 # groups repconv_kernels=[1,3,5,7] #kernel list repmlp=RepMLP(C,O,H,W,h,w,fc1_fc2_reduction,fc3_groups,repconv_kernels=repconv_kernels) x=torch.randn(N,C,H,W) repmlp.eval() for module in repmlp.modules(): if isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm1d): nn.init.uniform_(module.running_mean, 0, 0.1) nn.init.uniform_(module.running_var, 0, 0.1) nn.init.uniform_(module.weight, 0, 0.1) nn.init.uniform_(module.bias, 0, 0.1) #training result out=repmlp(x) #inference result repmlp.switch_to_deploy() deployout = repmlp(x) print(((deployout-out)**2).sum())
"MLP-Mixer: An all-MLP Architecture for Vision"
from model.mlp.mlp_mixer import MlpMixer import torch mlp_mixer=MlpMixer(num_classes=1000,num_blocks=10,patch_size=10,tokens_hidden_dim=32,channels_hidden_dim=1024,tokens_mlp_dim=16,channels_mlp_dim=1024) input=torch.randn(50,3,40,40) output=mlp_mixer(input) print(output.shape)
"ResMLP: Feedforward networks for image classification with data-efficient training"
from model.mlp.resmlp import ResMLP import torch input=torch.randn(50,3,14,14) resmlp=ResMLP(dim=128,image_size=14,patch_size=7,class_num=1000) out=resmlp(input) print(out.shape) #the last dimention is class_num
from model.mlp.g_mlp import gMLP import torch num_tokens=10000 bs=50 len_sen=49 num_layers=6 input=torch.randint(num_tokens,(bs,len_sen)) #bs,len_sen gmlp = gMLP(num_tokens=num_tokens,len_sen=len_sen,dim=512,d_ff=1024) output=gmlp(input) print(output.shape)
"Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?"
from model.mlp.sMLP_block import sMLPBlock import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(50,3,224,224) smlp=sMLPBlock(h=224,w=224) out=smlp(input) print(out.shape)
"Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition"
from model.mlp.vip-mlp import VisionPermutator import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(1,3,224,224) model = VisionPermutator( layers=[4, 3, 8, 3], embed_dims=[384, 384, 384, 384], patch_size=14, transitions=[False, False, False, False], segment_dim=[16, 16, 16, 16], mlp_ratios=[3, 3, 3, 3], mlp_fn=WeightedPermuteMLP ) output=model(input) print(output.shape)
-
Pytorch implementation of "RepVGG: Making VGG-style ConvNets Great Again---CVPR2021"
-
Pytorch implementation of "ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks---ICCV2019"
-
Pytorch implementation of "Diverse Branch Block: Building a Convolution as an Inception-like Unit---CVPR2021"
"RepVGG: Making VGG-style ConvNets Great Again"
from model.rep.repvgg import RepBlock import torch input=torch.randn(50,512,49,49) repblock=RepBlock(512,512) repblock.eval() out=repblock(input) repblock._switch_to_deploy() out2=repblock(input) print('difference between vgg and repvgg') print(((out2-out)**2).sum())
"ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks"
from model.rep.acnet import ACNet import torch from torch import nn input=torch.randn(50,512,49,49) acnet=ACNet(512,512) acnet.eval() out=acnet(input) acnet._switch_to_deploy() out2=acnet(input) print('difference:') print(((out2-out)**2).sum())
"Diverse Branch Block: Building a Convolution as an Inception-like Unit"
from model.rep.ddb import transI_conv_bn import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) #conv+bn conv1=nn.Conv2d(64,64,3,padding=1) bn1=nn.BatchNorm2d(64) bn1.eval() out1=bn1(conv1(input)) #conv_fuse conv_fuse=nn.Conv2d(64,64,3,padding=1) conv_fuse.weight.data,conv_fuse.bias.data=transI_conv_bn(conv1,bn1) out2=conv_fuse(input) print("difference:",((out2-out1)**2).sum().item())
from model.rep.ddb import transII_conv_branch import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) #conv+conv conv1=nn.Conv2d(64,64,3,padding=1) conv2=nn.Conv2d(64,64,3,padding=1) out1=conv1(input)+conv2(input) #conv_fuse conv_fuse=nn.Conv2d(64,64,3,padding=1) conv_fuse.weight.data,conv_fuse.bias.data=transII_conv_branch(conv1,conv2) out2=conv_fuse(input) print("difference:",((out2-out1)**2).sum().item())
from model.rep.ddb import transIII_conv_sequential import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) #conv+conv conv1=nn.Conv2d(64,64,1,padding=0,bias=False) conv2=nn.Conv2d(64,64,3,padding=1,bias=False) out1=conv2(conv1(input)) #conv_fuse conv_fuse=nn.Conv2d(64,64,3,padding=1,bias=False) conv_fuse.weight.data=transIII_conv_sequential(conv1,conv2) out2=conv_fuse(input) print("difference:",((out2-out1)**2).sum().item())
from model.rep.ddb import transIV_conv_concat import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) #conv+conv conv1=nn.Conv2d(64,32,3,padding=1) conv2=nn.Conv2d(64,32,3,padding=1) out1=torch.cat([conv1(input),conv2(input)],dim=1) #conv_fuse conv_fuse=nn.Conv2d(64,64,3,padding=1) conv_fuse.weight.data,conv_fuse.bias.data=transIV_conv_concat(conv1,conv2) out2=conv_fuse(input) print("difference:",((out2-out1)**2).sum().item())
from model.rep.ddb import transV_avg import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) avg=nn.AvgPool2d(kernel_size=3,stride=1) out1=avg(input) conv=transV_avg(64,3) out2=conv(input) print("difference:",((out2-out1)**2).sum().item())
from model.rep.ddb import transVI_conv_scale import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,64,7,7) #conv+conv conv1x1=nn.Conv2d(64,64,1) conv1x3=nn.Conv2d(64,64,(1,3),padding=(0,1)) conv3x1=nn.Conv2d(64,64,(3,1),padding=(1,0)) out1=conv1x1(input)+conv1x3(input)+conv3x1(input) #conv_fuse conv_fuse=nn.Conv2d(64,64,3,padding=1) conv_fuse.weight.data,conv_fuse.bias.data=transVI_conv_scale(conv1x1,conv1x3,conv3x1) out2=conv_fuse(input) print("difference:",((out2-out1)**2).sum().item())
-
Pytorch implementation of "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications---CVPR2017"
-
Pytorch implementation of "Efficientnet: Rethinking model scaling for convolutional neural networks---PMLR2019"
-
Pytorch implementation of "Involution: Inverting the Inherence of Convolution for Visual Recognition---CVPR2021"
-
Pytorch implementation of "Dynamic Convolution: Attention over Convolution Kernels---CVPR2020 Oral"
-
Pytorch implementation of "CondConv: Conditionally Parameterized Convolutions for Efficient Inference---NeurIPS2019"
"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"
from model.conv.DepthwiseSeparableConvolution import DepthwiseSeparableConvolution import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,3,224,224) dsconv=DepthwiseSeparableConvolution(3,64) out=dsconv(input) print(out.shape)
"Efficientnet: Rethinking model scaling for convolutional neural networks"
from model.conv.MBConv import MBConvBlock import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,3,224,224) mbconv=MBConvBlock(ksize=3,input_filters=3,output_filters=512,image_size=224) out=mbconv(input) print(out.shape)
"Involution: Inverting the Inherence of Convolution for Visual Recognition"
from model.conv.Involution import Involution import torch from torch import nn from torch.nn import functional as F input=torch.randn(1,4,64,64) involution=Involution(kernel_size=3,in_channel=4,stride=2) out=involution(input) print(out.shape)
"Dynamic Convolution: Attention over Convolution Kernels"
from model.conv.DynamicConv import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(2,32,64,64) m=DynamicConv(in_planes=32,out_planes=64,kernel_size=3,stride=1,padding=1,bias=False) out=m(input) print(out.shape) # 2,32,64,64
"CondConv: Conditionally Parameterized Convolutions for Efficient Inference"
from model.conv.CondConv import * import torch from torch import nn from torch.nn import functional as F if __name__ == '__main__': input=torch.randn(2,32,64,64) m=CondConv(in_planes=32,out_planes=64,kernel_size=3,stride=1,padding=1,bias=False) out=m(input) print(out.shape)
🔥🔥🔥 重磅!!!作为项目补充,更多论文层面的解析,可以关注新开源的项目 FightingCV-Paper-Reading ,里面汇集和整理了各大顶会顶刊的论文解析
🔥🔥🔥重磅!!! 最近为大家整理了网上的各种AI相关的视频教程和必读论文 FightingCV-Course
🔥🔥🔥 重磅!!!最近全新开源了一个 YOLOAir 目标检测代码库 ,里面集成了多种YOLO模型,包括YOLOv5, YOLOv7,YOLOR, YOLOX,YOLOv4, YOLOv3以及其他YOLO模型,还包括多种现有Attention机制。
🔥🔥🔥 ECCV2022论文汇总:ECCV2022-Paper-List