Mixture of Experts

[Paper] Adaptive Mixtures of Local Experts (1991)

최근 효율적인 트랜스포머 모델 학습 방법으로 사용되고 있는 MoE (Mixtures of Experts)의 기반이 되는 논문 중 하나로, 개념적인 부분까지만 해석 및 정리해보았습니다.원본 링크 우리는 (전체 훈련

velog.io

Mixture of Experts Explained

Hi, the figures are missing

huggingface.co

Adaptive Mixture of Local Experts (1991)
Learning Factored Representations in a Deep Mixture of Experts (2013)
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017)
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Jun 2020)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (Dec 2021)
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jan 2022)
ST-MoE: Designing Stable and Transferable Sparse Expert Models (Feb 2022)
FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models(April 2022)
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (Nov 2022)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models (May 2023)
Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1.

'News > 논문' 카테고리의 다른 글

Statistical Machine Translation (통계기반 기계번역) (0)	2025.05.26
LangChain/LLM Agent (0)	2024.09.25
Reinforcement Learning Introduction (0)	2024.08.14
Symbolic AI/Symbolic Execution (0)	2024.07.15

Mixture of Experts

'News > 논문' 카테고리의 다른 글

티스토리툴바