A Survey on Model Compression and Acceleration for Pretrained Language Models

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP

arxiv.org

Canwen Xu, Julian McAuley

고비용 에너지와 오래 걸리는 인퍼런스 딜레이는 무수한 NLP Task에서 최신 성능을 했다. 그럼에도 트랜스포머에 기반한 언어 모델 (PLMs)로부터 에지와 모바일 컴퓨팅을 포함한 더 넓은 분야에서 적용되지 않았다. 효율적인 NLP 연구는 NLP의 LIfe Cycle(데이터 준비, 모델 훈련과 인퍼런스)에서 포괄적으로 연산, 시간과 이산화탄소 배출을 고려한다. 이번 연구에서 인퍼런스 스테이지에 집중하고 사전 학습 모델(벤치마크, 메트릭, 방법론)을 위한 모델 압축과 가속에 대한 현황을 되돌아본다.

Despite achieving state-of-the art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

Comments: Accepted to AAAI 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2202.07105 [cs.CL]
(or arXiv:2202.07105v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2202.07105

'News > 논문' 카테고리의 다른 글

ELMo, GPT, BERT (0)	2023.07.06
RNN, Transformer (0)	2023.07.06
arxiv-utils Google Chrome Extension (0)	2023.06.22
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability (0)	2023.05.07

A Survey on Model Compression and Acceleration for Pretrained Language Models

'News > 논문' 카테고리의 다른 글

티스토리툴바