查看: 3442|回复: 0

TRANSFORMER论文合集 [复制链接]

edwinner

军衔等级：

大校

注册：2020-8-20 点赞数

78

电梯直达

1^# 大中小

发表于 2024-5-22 09:05:56 |只看该作者 |倒序浏览

TRANSFORMER论文合集
链接：https://pan.baidu.com/s/118E1B9cnPGUxad7XSKnQBQ?pwd=uyrz

---TRANSFORMER论文合集03-Conv Transformer
      1-LeViT  a Vision Transformer in ConvNet’s Clothing for Faster Inference
      10-How to Train Vision Transformer on Small-scale Datasets
      11-Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
      12-Inception Transformer
      13-Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
      14-DESIGNING BERT FOR CONVOLUTIONAL NETWORKS SPARSE AND HIERARCHICAL MASKED MODELING
      14-MOAT ALTERNATING MOBILE CONVOLUTION AND ATTENTION BRINGS STRONG VISION MODELS
      15-InternImage Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
      16- PSLT A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift
      2-Incorporating Convolution Designs into Visual Transformers
      3-Conformer Local Features Coupling Global Representations for Visual Recognition
      4-Co-Scale Conv-Attentional Image Transformers
      5- Introducing Convolutions to Vision Transformers
      6-MOBILEVIT LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TRANSFORMER,
      7-Mobile-Former Bridging MobileNet and Transformer
      8-TinyViT Fast Pretraining Distillation for Small Vision Transformers
      9-ParC-Net Position Aware Circular Convolution with Merits from ConvNets and Transformer

---TRANSFORMER论文合集04-Training Transformer
      01-Generative Pretraining from Pixels
      02-Learning Transferable Visual Models From Natural Language Supervision
      03-An Empirical Study of Training Self-Supervised Vision Transformers
      04-Emerging Properties in Self-Supervised Vision Transformers
      05-Efficient Training of Visual Transformers with Small Datasets
      06-Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
      07- Masked Self-Supervised Transformer for Visual Representation
      08-BERT Pre-Training of Image Transformers
      09- IMAGE BERT PRE-TRAINING WITH ONLINE TOKENIZER
      10-Automated Progressive Learning for Efficient Training of Vision Transformers
      11-Masked Autoencoders Are Scalable Vision Learners
      12-a Simple Framework for Masked Image Modeling
      13-Patch-level Representation Learning for Self-supervised Vision Transformers
      14-Towards Liberating Vision Transformers from Pre-training
      15-Attend to Mix for Vision Transformers
      16-A General Framework for Self-supervised Learning in Speech, Vision and Language
      17-Self-supervised Models are Good Teaching Assistants for Vision Transformers
      18-Position Prediction as an Effective Pretraining Strategy
      19-Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning
      20-Bootstrapped Masked Autoencoders for Vision BERT Pretraining
      21-Rethinking Image Mixing for Data Augmentation in Vision Transformers
      22-Locality Guidance for Improving Vision Transformers on Tiny Datasets
      23-Improving Vision Transformers by Revisiting High-frequency Components
      24-What to Hide from Your Students Attention-Guided Masked Image Modeling
      25-Self-supervision meets Language-Image Pre-training
      26-Multi-choice Discretization for Image BERT Pre-training
      27-Scalable Learning to Optimize A Learned Optimizer Can Train Big Models
      28- TokenMixup Efficient Attention-guided Token-level Data Augmentation for Transformers
      29-Green Hierarchical Vision Transformer for Masked Image Modeling
      30-MIXPRO DATA AUGMENTATION WITH MASKMIX AND PROGRESSIVE ATTENTION LABELING FOR VISION TRANSFORMER
      31-MASKED IMAGE MODELING WITH DENOISING CONTRAST
      32-MASKED FREQUENCY MODELING FOR SELF-SUPERVISED VISUAL PRE-TRAINING
      33-Pre-training Vision Transformers with Sinusoidal Waves
      34-Learning Visual Representations via Language-Guided Sampling
      35-DisCo-CLIP A Distributed Contrastive Loss for Memory Efficient CLIP Training
      36-Masked Self-Distillation Advances Contrastive Language-Image Pretraining
      37-MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
      38-Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
      39-Integrally Pre-Trained Transformer Pyramid Networks
      40-DropKey
      41-One Model for All Patch Size
      42-Image-and-Language Understanding from Pixels Only
      43-Masked Autoencoders Enable Efficient Knowledge Distillers
      44-Hard Patches Mining for Masked Image Modeling
      45-Stare at What You See Masked Image Modeling without Reconstruction
      46-RILS  Masked Visual Reconstruction in Language Semantic Space
      47-Revisiting Multimodal Representation in Contrastive Learning From Patch and Token Embeddings to Finite Discrete Tokens
      48-Reproducible scaling laws for contrastive language-image learning
      49-Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
      50-Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
      51-Stitchable Neural Networks
      52-A Closer Look at Self-Supervised Lightweight Vision Transformers
      53-Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models
      54-Architecture-Agnostic Masked Image Modeling – From ViT back to CNN
      55-Patch-level Contrastive Learning via Positional Query for Visual Pretraining
      56-DreamTeacher  Pretraining Image Backbones with Deep Generative Models

---TRANSFORMER论文合集05-Robustness Transformer
      01-Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
      02-Are Transformers More Robust Than CNNs
      03-Vision Transformers are Robust Learners
      04-Towards Transferable Adversarial Attacks on Vision Transformers
      05-MIA-Former Efficient and Robust Vision Transformers  via Multi-grained Input Adaptation
      06-PATCH-FOOL  ARE VISION TRANSFORMERS ALWAYS ROBUST AGAINST ADVERSARIAL PERTURBATIONS
      07-Certified Patch Robustness via Smoothed Vision Transformers
      08-Towards Robust Vision Transformer
      09-Visual Attention Emerges from Recurrent Sparse Reconstruction
      10-Understanding The Robustness in Vision Transformers
      11-Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment
      12-Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem
      13-ViP Unified Certified Detection and Recovery for Patch Attack with Vision Transformers
      14-When Adversarial Training Meets Vision Transformers Recipes from Training to Architecture
      15-Optimizing Relevance Maps of Vision Transformers Improves Robustness
      16-CAN CNNS BE MORE ROBUST THAN TRANSFORMERS
      17-DENOISING MASKED AUTOENCODERS HELP ROBUST CLASSIFICATION
      18-Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization

---TRANSFORMER论文合集06-Model Compression Transformer
      01-UNIFIED VISUAL TRANSFORMER COMPRESSION
      11-UPop Unified and Progressive Pruning for Compressing Vision-Language Transformers
      2-MiniViT Compressing Vision Transformers withWeight Multiplexing
      3-SPViT Enabling Faster Vision Transformers via Latency-aware Soft Token Pruning
      4-Patch Similarity Aware Data-Free Quantization for Vision Transformers
      5-Q-ViT Accurate and Fully Quantized Low-bit Vision Transformer
      6-VTC-LFC Vision Transformer Compression with Low-Frequency Components
      7-PSAQ-ViT V2 Towards Accurate and General Data-Free Quantization for Vision Transformers
      8-Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
      9-Pushing Binary Vision Transformers Towards Convolutional Models

---TRANSFORMER论文合集07-其他
      01-On Layer Normalization in the Transformer Architecture
      02-UPop Unified and Progressive Pruning for Compressing Vision-Language Transformers
      03-Linear Transformers Are Secretly Fast Weight Programmers
      04-Attention Is All You Need
      05-Transformer with Dual Residual Connections
      06-Universal Language Model Fine-tuning for Text Classification
      07-Pre-training of Deep Bidirectional Transformers for Language Understanding
      08-Improving Language Understanding by Generative Pre-Training
      09-A Survey on Efficient Training of Transformers
      10-FlashAttention Fast and Memory-Efficient Exact Attention with IO-Awareness
      11-Asynchronous Methods for Deep Reinforcement Learning
      12-Harnessing the Power of LLMs in Practice A Survey on ChatGPT and Beyond
      13-Efficient Transformers A Survey
      14-CRAMMING TRAINING A LANGUAGE MODEL ON A SINGLE GPU IN ONE DAY
      15-LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
      16-Scaling Down to Scale Up A Guide to Parameter-Efficient Fine-Tuning
      17-Pythia A Suite for Analyzing Large Language Models Across Training and Scaling
      18-Training Compute-Optimal Large Language Models
      19-Scaling Language Models  Methods, Analysis & Insights from Training Gopher
      20-Constitutional AI Harmlessness from AI Feedback
      20-Training language models to follow instructions with human feedback
      21-SELF-INSTRUCT Aligning Language Models with Self-Generated Instructions
      22-Fine-Tuning Language Models from Human Preferences
      23-Learning to summarize from human feedback
      24-BART Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
      25-Training language models to follow instructions with human feedback
      26-TOKEN MERGING YOUR VIT BUT FASTER
      27-A Fast Post-Training Pruning Framework for Transformers
      28-Swin Transformer Hierarchical Vision Transformer using Shifted Windows
      29-AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

---TRANSFORMER论文合集1-General Vision Transformer
      01-AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
      02- General Perception with Iterative Attention
      03-Rethinking Spatial Dimensions of Vision Transformers
      05-Pyramid Vision Transformer  A Versatile Backbone for Dense Prediction without Convolutions
      06-Rethinking and Improving Relative Position Encoding for Vision Transformer
      07-Going deeper with Image Transformers
      08-Swin Transformer  Hierarchical Vision Transformer using Shifted Windows
      09-Tokens-to-Token ViT Training Vision Transformers from Scratch on ImageNet
      10-DPT Deformable Patch-based Transformer for Visual Recognition
      11-Focal Self-attention for Local-Global Interactions in Vision Transformers
      12-Twins  Revisiting the Design of Spatial Attention in Vision Transformers
      13-Blending Anti-Aliasing into Vision Transformer
      14-Not All Images areWorth 16x16Words Dynamic Transformers for Efficient Image Recognition
      15-Transformer in Transformer
      16-Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
      17-DeepViT  Towards Deeper Vision Transformer
      18-All Tokens Matter Token Labeling for Training Better Vision Transformers
      19-Less is More Pay Less Attention in Vision Transformers
      20-DYNAMIC TOKEN NORMALIZATION IMPROVES VISION TRANSFORMER
      21-REGIONVIT REGIONAL-TO-LOCAL ATTENTION FOR VISION TRANSFORMERS
      22-CROSSFORMER A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION
      23-CSWin Transformer A General Vision Transformer Backbone witH Cross-ShapedWindows
      24-MPViT  Multi-Path Vision Transformer for Dense Prediction
      25-The Principle of Diversity Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
      26-Beyond Fixation  Dynamic Window Visual Transformer
      27-MixFormer  Mixing Features across Windows and Dimensions
      28-Vision Transformer with Deformable Attention
      29-Swin Transformer V2  Scaling Up Capacity and Resolution
      30-MSG-Transformer Exchanging Local Spatial Information by Manipulating Messenger Tokens
      31-Nominate Synergistic Context in Vision Transformer
      32-Shunted Self-Attention via Multi-Scale Token Aggregation
      33-Improved Transformer-in-Transformer Baselines with Pyramid Architecture
      34-Object-aware Mixing Layer for Vision Transformers
      35-Unified Normalization for Accelerating and Stabilizing Transformers
      36-Wave-ViT Unifying Wavelet and Transformers for Visual Representation Learning
      37- Dual Attention Vision Transformers
      38-Multi-Axis Vision Transformer
      39-Learning Varied-Size Window Attention in Vision Transformers
      40-Fast Vision Transformers with HiLo Attention
      41-GPVIT A HIGH RESOLUTION NON-HIERARCHICAL VISION TRANSFORMER WITH GROUP PROPAGATION
      42-CONDITIONAL POSITIONAL ENCODINGS FOR VISION TRANSFORMERS
      43-LIPSFORMER  INTRODUCING LIPSCHITZ CONTINUITY TO VISION TRANSFORMERS
      44-BiFormer Vision Transformer with Bi-Level Routing Attention
      45-Top-Down Visual Attention from Analysis by Synthesis
      46-Visual Dependency Transformers Dependency Tree Emerges from Reversed Attention
      47-ResFormer  Scaling ViTs with Multi-Resolution Training
      48-Vision Transformer with Super Token Sampling
      49-PaCa-ViT Learning Patch-to-Cluster Attention in Vision Transformers
      50-Global Context Vision Transformers
      51-Foundation Transformers
      52-Scale-Aware Modulation Meet Transformer
      53-CrossFormer A Versatile Vision Transformer Hinging on Cross-scale Attention
      54-Vision Transformer with Quadrangle Attention

---TRANSFORMER论文合集2-Efficient Vision Transformer
      1-Training data-efficient image transformers & distillation through attention
      10-Glance-and-Gaze Vision Transformer
      11-DynamicViT  Efficient Vision Transformers with Dynamic Token Sparsification
      12-ResT  An Efficient Transformer for Visual Recognition
      13-SOFT Softmax-free Transformer with Linear Complexity
      14-Evo-ViT Slow-Fast Token Evolution for Dynamic Vision Transformer
      15-Pale Transformer A General Vision Transformer Backbone with Pale-Shaped Attention
      16-When Shift Operation Meets Vision Transformer An Extremely Simple Alternative to Attention Mechanism
      17-NOT ALL PATCHES ARE WHAT YOU NEED EXPEDITING VISION TRANSFORMERS VIA TOKEN REORGANIZATIONS
      18-QUADTREE ATTENTION FOR VISION TRANSFORMERS
      19-ANTI-OVERSMOOTHING IN DEEP VISION TRANSFORMERS VIA THE FOURIER DOMAIN ANALYSIS - FROM THEORY TO PRACTICE
      2-ConViT Improving Vision Transformers with Soft Convolutional Inductive Biases
      20-Learned Queries for Efficient Local Attention
      21-Lite Vision Transformer with Enhanced Self-Attention
      22-A-ViT  Adaptive Tokens for Efficient Vision Transformer
      23-Reversible Vision Transformers
      24-Adaptive Token Sampling For Efficient Vision Transformers
      25-EdgeViTs  Competing Light-weight CNNs on Mobile Devices with Vision Transformers
      26-Sliced Recursive Transformer
      27-Self-slimmed Vision Transformer
      28-M3ViT  Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
      29-ResT V2  Simpler, Faster and Stronger
      3-Scalable Vision Transformers with Hierarchical Pooling
      30-EfficientFormer  Vision Transformers at MobileNet Speed
      31-GhostNetV2 Enhance Cheap Operation with Long-Range Attention
      32-Peeling the Onion Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
      33-TOKEN MERGING  YOUR VIT BUT FASTER
      34-HiViT  Hierarchical Vision Transformer Meets Masked Image Modeling
      35-Making Vision Transformers Efficient from A Token Sparsification View
      36-SparseViT  Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
      37-Slide-Transformer  Hierarchical Vision Transformer with Local Self-Attention
      38-RIFormer  Keep Your Vision Backbone Effective But Removing Token Mixer
      39-EfficientViT  Memory Efficient Vision Transformer with Cascaded Group Attention
      4-CrossViT Cross-Attention Multi-Scale Vision Transformer for Image Classification
      40-Castling-ViT Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
      41-RGB no more  Minimally-decoded JPEG Vision Transformers
      42-Learned Thresholds Token Merging and Pruning for Vision Transformers
      5-Multi-Scale Vision Longformer A New Vision Transformer for High-Resolution Image Encoding
      6-Visformer The Vision-friendly Transformer
      7-Multi-Exit Vision Transformer for Dynamic Inference
      8-Chasing Sparsity in Vision Transformers An End-to-End Exploration
      9-Dynamic Grained Encoder for Vision Transformers

0 举报本楼

返回列表

版规|手机版|C114 ( 沪ICP备12002291号-1 )|联系我们 |网站地图

GMT+8, 2025-9-28 02:28 , Processed in 0.561759 second(s), 17 queries , Gzip On.

Discuz Licensed

		自动登录	找回密码
密码			注册

TRANSFORMER论文合集 [复制链接]

浏览过的帖子

浏览过的版块