通信人家园

 找回密码
 注册

只需一步,快速开始

短信验证,便捷登录

搜索
查看: 3149|回复: 0
打印

TRANSFORMER论文合集 [复制链接]

军衔等级:

  上校

注册:2020-8-2070
跳转到指定楼层
1#
发表于 2024-5-22 09:05:56 |只看该作者 |倒序浏览
TRANSFORMER论文合集
链接:https://pan.baidu.com/s/118E1B9cnPGUxad7XSKnQBQ?pwd=uyrz


---TRANSFORMER论文合集03-Conv   Transformer
        1-LeViT  a Vision Transformer in ConvNet’s Clothing for Faster Inference
        10-How to Train Vision Transformer on Small-scale Datasets
        11-Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
        12-Inception Transformer
        13-Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
        14-DESIGNING BERT FOR CONVOLUTIONAL NETWORKS SPARSE AND HIERARCHICAL MASKED MODELING
        14-MOAT ALTERNATING MOBILE CONVOLUTION AND ATTENTION BRINGS STRONG VISION MODELS
        15-InternImage Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
        16- PSLT A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift
        2-Incorporating Convolution Designs into Visual Transformers
        3-Conformer Local Features Coupling Global Representations for Visual Recognition
        4-Co-Scale Conv-Attentional Image Transformers
        5- Introducing Convolutions to Vision Transformers
        6-MOBILEVIT LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TRANSFORMER,
        7-Mobile-Former Bridging MobileNet and Transformer
        8-TinyViT Fast Pretraining Distillation for Small Vision Transformers
        9-ParC-Net Position Aware Circular Convolution with Merits from ConvNets and Transformer

---TRANSFORMER论文合集04-Training   Transformer
        01-Generative Pretraining from Pixels
        02-Learning Transferable Visual Models From Natural Language Supervision
        03-An Empirical Study of Training Self-Supervised Vision Transformers
        04-Emerging Properties in Self-Supervised Vision Transformers
        05-Efficient Training of Visual Transformers with Small Datasets
        06-Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
        07- Masked Self-Supervised Transformer for Visual Representation
        08-BERT Pre-Training of Image Transformers
        09- IMAGE BERT PRE-TRAINING WITH ONLINE TOKENIZER
        10-Automated Progressive Learning for Efficient Training of Vision Transformers
        11-Masked Autoencoders Are Scalable Vision Learners
        12-a Simple Framework for Masked Image Modeling
        13-Patch-level Representation Learning for Self-supervised Vision Transformers
        14-Towards Liberating Vision Transformers from Pre-training
        15-Attend to Mix for Vision Transformers
        16-A General Framework for Self-supervised Learning in Speech, Vision and Language
        17-Self-supervised Models are Good Teaching Assistants for Vision Transformers
        18-Position Prediction as an Effective Pretraining Strategy
        19-Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning
        20-Bootstrapped Masked Autoencoders for Vision BERT Pretraining
        21-Rethinking Image Mixing for Data Augmentation in Vision Transformers
        22-Locality Guidance for Improving Vision Transformers on Tiny Datasets
        23-Improving Vision Transformers by Revisiting High-frequency Components
        24-What to Hide from Your Students Attention-Guided Masked Image Modeling
        25-Self-supervision meets Language-Image Pre-training
        26-Multi-choice Discretization for Image BERT Pre-training
        27-Scalable Learning to Optimize A Learned Optimizer Can Train Big Models
        28- TokenMixup Efficient Attention-guided Token-level Data Augmentation for Transformers
        29-Green Hierarchical Vision Transformer for Masked Image Modeling
        30-MIXPRO DATA AUGMENTATION WITH MASKMIX AND PROGRESSIVE ATTENTION LABELING FOR VISION TRANSFORMER
        31-MASKED IMAGE MODELING WITH DENOISING CONTRAST
        32-MASKED FREQUENCY MODELING FOR SELF-SUPERVISED VISUAL PRE-TRAINING
        33-Pre-training Vision Transformers with Sinusoidal Waves
        34-Learning Visual Representations via Language-Guided Sampling
        35-DisCo-CLIP A Distributed Contrastive Loss for Memory Efficient CLIP Training
        36-Masked Self-Distillation Advances Contrastive Language-Image Pretraining
        37-MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
        38-Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
        39-Integrally Pre-Trained Transformer Pyramid Networks
        40-DropKey
        41-One Model for All Patch Size
        42-Image-and-Language Understanding from Pixels Only
        43-Masked Autoencoders Enable Efficient Knowledge Distillers
        44-Hard Patches Mining for Masked Image Modeling
        45-Stare at What You See Masked Image Modeling without Reconstruction
        46-RILS  Masked Visual Reconstruction in Language Semantic Space
        47-Revisiting Multimodal Representation in Contrastive Learning From Patch and Token Embeddings to Finite Discrete Tokens
        48-Reproducible scaling laws for contrastive language-image learning
        49-Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
        50-Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
        51-Stitchable Neural Networks
        52-A Closer Look at Self-Supervised Lightweight Vision Transformers
        53-Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models
        54-Architecture-Agnostic Masked Image Modeling – From ViT back to CNN
        55-Patch-level Contrastive Learning via Positional Query for Visual Pretraining
        56-DreamTeacher  Pretraining Image Backbones with Deep Generative Models

---TRANSFORMER论文合集05-Robustness   Transformer
        01-Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
        02-Are Transformers More Robust Than CNNs
        03-Vision Transformers are Robust Learners
        04-Towards Transferable Adversarial Attacks on Vision Transformers
        05-MIA-Former Efficient and Robust Vision Transformers  via Multi-grained Input Adaptation
        06-PATCH-FOOL  ARE VISION TRANSFORMERS ALWAYS ROBUST AGAINST ADVERSARIAL PERTURBATIONS
        07-Certified Patch Robustness via Smoothed Vision Transformers
        08-Towards Robust Vision Transformer
        09-Visual Attention Emerges from Recurrent Sparse Reconstruction
        10-Understanding The Robustness in Vision Transformers
        11-Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment
        12-Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem
        13-ViP Unified Certified Detection and Recovery for Patch Attack with Vision Transformers
        14-When Adversarial Training Meets Vision Transformers Recipes from Training to Architecture
        15-Optimizing Relevance Maps of Vision Transformers Improves Robustness
        16-CAN CNNS BE MORE ROBUST THAN TRANSFORMERS
        17-DENOISING MASKED AUTOENCODERS HELP ROBUST CLASSIFICATION
        18-Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization

---TRANSFORMER论文合集06-Model Compression   Transformer
        01-UNIFIED VISUAL TRANSFORMER COMPRESSION
        11-UPop Unified and Progressive Pruning for Compressing Vision-Language Transformers
        2-MiniViT Compressing Vision Transformers withWeight Multiplexing
        3-SPViT Enabling Faster Vision Transformers via Latency-aware Soft Token Pruning
        4-Patch Similarity Aware Data-Free Quantization for Vision Transformers
        5-Q-ViT Accurate and Fully Quantized Low-bit Vision Transformer
        6-VTC-LFC Vision Transformer Compression with Low-Frequency Components
        7-PSAQ-ViT V2 Towards Accurate and General Data-Free Quantization for Vision Transformers
        8-Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
        9-Pushing Binary Vision Transformers Towards Convolutional Models

---TRANSFORMER论文合集07-其他
        01-On Layer Normalization in the Transformer Architecture
        02-UPop Unified and Progressive Pruning for Compressing Vision-Language Transformers
        03-Linear Transformers Are Secretly Fast Weight Programmers
        04-Attention Is All You Need
        05-Transformer with Dual Residual Connections
        06-Universal Language Model Fine-tuning for Text Classification
        07-Pre-training of Deep Bidirectional Transformers for Language Understanding
        08-Improving Language Understanding by Generative Pre-Training
        09-A Survey on Efficient Training of Transformers
        10-FlashAttention Fast and Memory-Efficient Exact Attention with IO-Awareness
        11-Asynchronous Methods for Deep Reinforcement Learning
        12-Harnessing the Power of LLMs in Practice A Survey on ChatGPT and Beyond
        13-Efficient Transformers A Survey
        14-CRAMMING TRAINING A LANGUAGE MODEL ON A SINGLE GPU IN ONE DAY
        15-LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
        16-Scaling Down to Scale Up A Guide to Parameter-Efficient Fine-Tuning
        17-Pythia A Suite for Analyzing Large Language Models Across Training and Scaling
        18-Training Compute-Optimal Large Language Models
        19-Scaling Language Models  Methods, Analysis & Insights from Training Gopher
        20-Constitutional AI Harmlessness from AI Feedback
        20-Training language models to follow instructions with human feedback
        21-SELF-INSTRUCT Aligning Language Models with Self-Generated Instructions
        22-Fine-Tuning Language Models from Human Preferences
        23-Learning to summarize from human feedback
        24-BART Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
        25-Training language models to follow instructions with human feedback
        26-TOKEN MERGING YOUR VIT BUT FASTER
        27-A Fast Post-Training Pruning Framework for Transformers
        28-Swin Transformer Hierarchical Vision Transformer using Shifted Windows
        29-AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

---TRANSFORMER论文合集1-General Vision Transformer
        01-AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
        02- General Perception with Iterative Attention
        03-Rethinking Spatial Dimensions of Vision Transformers
        05-Pyramid Vision Transformer  A Versatile Backbone for Dense Prediction without Convolutions
        06-Rethinking and Improving Relative Position Encoding for Vision Transformer
        07-Going deeper with Image Transformers
        08-Swin Transformer  Hierarchical Vision Transformer using Shifted Windows
        09-Tokens-to-Token ViT Training Vision Transformers from Scratch on ImageNet
        10-DPT Deformable Patch-based Transformer for Visual Recognition
        11-Focal Self-attention for Local-Global Interactions in Vision Transformers
        12-Twins  Revisiting the Design of Spatial Attention in Vision Transformers
        13-Blending Anti-Aliasing into Vision Transformer
        14-Not All Images areWorth 16x16Words Dynamic Transformers for Efficient Image Recognition
        15-Transformer in Transformer
        16-Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
        17-DeepViT  Towards Deeper Vision Transformer
        18-All Tokens Matter Token Labeling for Training Better Vision Transformers
        19-Less is More Pay Less Attention in Vision Transformers
        20-DYNAMIC TOKEN NORMALIZATION IMPROVES VISION TRANSFORMER
        21-REGIONVIT REGIONAL-TO-LOCAL ATTENTION FOR VISION TRANSFORMERS
        22-CROSSFORMER A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION
        23-CSWin Transformer A General Vision Transformer Backbone witH Cross-ShapedWindows
        24-MPViT  Multi-Path Vision Transformer for Dense Prediction
        25-The Principle of Diversity Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
        26-Beyond Fixation  Dynamic Window Visual Transformer
        27-MixFormer  Mixing Features across Windows and Dimensions
        28-Vision Transformer with Deformable Attention
        29-Swin Transformer V2  Scaling Up Capacity and Resolution
        30-MSG-Transformer Exchanging Local Spatial Information by Manipulating Messenger Tokens
        31-Nominate Synergistic Context in Vision Transformer
        32-Shunted Self-Attention via Multi-Scale Token Aggregation
        33-Improved Transformer-in-Transformer Baselines with Pyramid Architecture
        34-Object-aware Mixing Layer for Vision Transformers
        35-Unified Normalization for Accelerating and Stabilizing Transformers
        36-Wave-ViT Unifying Wavelet and Transformers for Visual Representation Learning
        37- Dual Attention Vision Transformers
        38-Multi-Axis Vision Transformer
        39-Learning Varied-Size Window Attention in Vision Transformers
        40-Fast Vision Transformers with HiLo Attention
        41-GPVIT A HIGH RESOLUTION NON-HIERARCHICAL VISION TRANSFORMER WITH GROUP PROPAGATION
        42-CONDITIONAL POSITIONAL ENCODINGS FOR VISION TRANSFORMERS
        43-LIPSFORMER  INTRODUCING LIPSCHITZ CONTINUITY TO VISION TRANSFORMERS
        44-BiFormer Vision Transformer with Bi-Level Routing Attention
        45-Top-Down Visual Attention from Analysis by Synthesis
        46-Visual Dependency Transformers Dependency Tree Emerges from Reversed Attention
        47-ResFormer  Scaling ViTs with Multi-Resolution Training
        48-Vision Transformer with Super Token Sampling
        49-PaCa-ViT Learning Patch-to-Cluster Attention in Vision Transformers
        50-Global Context Vision Transformers
        51-Foundation Transformers
        52-Scale-Aware Modulation Meet Transformer
        53-CrossFormer    A Versatile Vision Transformer Hinging on Cross-scale Attention
        54-Vision Transformer with Quadrangle Attention

---TRANSFORMER论文合集2-Efficient Vision Transformer
        1-Training data-efficient image transformers & distillation through attention
        10-Glance-and-Gaze Vision Transformer
        11-DynamicViT  Efficient Vision Transformers with Dynamic Token Sparsification
        12-ResT  An Efficient Transformer for Visual Recognition
        13-SOFT Softmax-free Transformer with Linear Complexity
        14-Evo-ViT Slow-Fast Token Evolution for Dynamic Vision Transformer
        15-Pale Transformer A General Vision Transformer Backbone with Pale-Shaped Attention
        16-When Shift Operation Meets Vision Transformer An Extremely Simple Alternative to Attention Mechanism
        17-NOT ALL PATCHES ARE WHAT YOU NEED EXPEDITING VISION TRANSFORMERS VIA TOKEN REORGANIZATIONS
        18-QUADTREE ATTENTION FOR VISION TRANSFORMERS
        19-ANTI-OVERSMOOTHING IN DEEP VISION TRANSFORMERS VIA THE FOURIER DOMAIN ANALYSIS - FROM THEORY TO PRACTICE
        2-ConViT Improving Vision Transformers with Soft Convolutional Inductive Biases
        20-Learned Queries for Efficient Local Attention
        21-Lite Vision Transformer with Enhanced Self-Attention
        22-A-ViT  Adaptive Tokens for Efficient Vision Transformer
        23-Reversible Vision Transformers
        24-Adaptive Token Sampling For Efficient Vision Transformers
        25-EdgeViTs  Competing Light-weight CNNs on Mobile Devices with Vision Transformers
        26-Sliced Recursive Transformer
        27-Self-slimmed Vision Transformer
        28-M3ViT  Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
        29-ResT V2  Simpler, Faster and Stronger
        3-Scalable Vision Transformers with Hierarchical Pooling
        30-EfficientFormer  Vision Transformers at MobileNet Speed
        31-GhostNetV2 Enhance Cheap Operation with Long-Range Attention
        32-Peeling the Onion Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
        33-TOKEN MERGING  YOUR VIT BUT FASTER
        34-HiViT  Hierarchical Vision Transformer Meets Masked Image Modeling
        35-Making Vision Transformers Efficient from A Token Sparsification View
        36-SparseViT  Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
        37-Slide-Transformer  Hierarchical Vision Transformer with Local Self-Attention
        38-RIFormer  Keep Your Vision Backbone Effective But Removing Token Mixer
        39-EfficientViT  Memory Efficient Vision Transformer with Cascaded Group Attention
        4-CrossViT Cross-Attention Multi-Scale Vision Transformer for Image Classification
        40-Castling-ViT Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
        41-RGB no more  Minimally-decoded JPEG Vision Transformers
        42-Learned Thresholds Token Merging and Pruning for Vision Transformers
        5-Multi-Scale Vision Longformer A New Vision Transformer for High-Resolution Image Encoding
        6-Visformer The Vision-friendly Transformer
        7-Multi-Exit Vision Transformer for Dynamic Inference
        8-Chasing Sparsity in Vision Transformers An End-to-End Exploration
        9-Dynamic Grained Encoder for Vision Transformers



举报本楼

您需要登录后才可以回帖 登录 | 注册 |

版规|手机版|C114 ( 沪ICP备12002291号-1 )|联系我们 |网站地图  

GMT+8, 2025-6-20 15:57 , Processed in 0.533451 second(s), 17 queries , Gzip On.

Copyright © 1999-2023 C114 All Rights Reserved

Discuz Licensed

回顶部