[Paper] Adaptive Mixtures of Local Experts (1991)
[Paper] Adaptive Mixtures of Local Experts (1991)
최근 효율적인 트랜스포머 모델 학습 방법으로 사용되고 있는 MoE (Mixtures of Experts)의 기반이 되는 논문 중 하나로, 개념적인 부분까지만 해석 및 정리해보았습니다.원본 링크 우리는 (전체 훈련
velog.io
Mixture of Experts Explained
Mixture of Experts Explained
Hi, the figures are missing
huggingface.co
Adaptive Mixture of Local Experts (1991)
Learning Factored Representations in a Deep Mixture of Experts (2013)
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017)
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Jun 2020)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (Dec 2021)
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jan 2022)
ST-MoE: Designing Stable and Transferable Sparse Expert Models (Feb 2022)
FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models(April 2022)
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (Nov 2022)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models (May 2023)
Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1.
'News > 논문' 카테고리의 다른 글
LangChain/LLM Agent (0) | 2024.09.25 |
---|---|
Reinforcement Learning Introduction (0) | 2024.08.14 |
Symbolic AI/Symbolic Execution (0) | 2024.07.15 |
BART Implementation Code (0) | 2024.06.25 |