본문 바로가기

News/논문

Mixture of Experts

[Paper] Adaptive Mixtures of Local Experts (1991)

 

[Paper] Adaptive Mixtures of Local Experts (1991)

최근 효율적인 트랜스포머 모델 학습 방법으로 사용되고 있는 MoE (Mixtures of Experts)의 기반이 되는 논문 중 하나로, 개념적인 부분까지만 해석 및 정리해보았습니다.원본 링크 우리는 (전체 훈련

velog.io

 

 

Mixture of Experts Explained

 

Mixture of Experts Explained

Hi, the figures are missing

huggingface.co

 

 

 

Adaptive Mixture of Local Experts (1991)
Learning Factored Representations in a Deep Mixture of Experts (2013)
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017)
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Jun 2020)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (Dec 2021)
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jan 2022)
ST-MoE: Designing Stable and Transferable Sparse Expert Models (Feb 2022)
FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models(April 2022)
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (Nov 2022)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models (May 2023)
Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1.

 

'News > 논문' 카테고리의 다른 글

LangChain/LLM Agent  (0) 2024.09.25
Reinforcement Learning Introduction  (0) 2024.08.14
Symbolic AI/Symbolic Execution  (0) 2024.07.15
BART Implementation Code  (0) 2024.06.25









>