07-03 QLoRA (T Dettmers, 2023)
https://wikidocs.net/232761
https://www.facebook.com/groups/TensorFlowKR/posts/2043855619288819/?paipv=0&eav=AfbdY6h01SJwQ1mXtjJ8BKHu77BLtnbU6feXjEltFkI77UJjyj39bhwNZdK0HJc8Pfw&_rdr
Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
https://huggingface.co/blog/4bit-transformers-bitsandbytes?utm_source=pytorchkr
Introduction to 8-bit Matrix Multiplication for transformers at scale
https://dhpark1212.tistory.com/entry/Introduction-to-8-bit-Matrix-Multiplication-for-transformers-at-scale
bitsandbytes 이슈 삽질기
https://blog.lablup.com/posts/2023/07/28/bitsandbytes/
Text Generation Inference(TGI)를 활용한 프로덕션 레벨 LLM 추론 가속화
https://medium.com/@nuatmochoi/text-generation-inference-tgi-%EB%A5%BC-%ED%99%9C%EC%9A%A9%ED%95%9C-%ED%94%84%EB%A1%9C%EB%8D%95%EC%85%98-%EB%A0%88%EB%B2%A8-llm-%EC%B6%94%EB%A1%A0-%EA%B0%80%EC%86%8D%ED%99%94-2b5f0641c232
LLM Trend Note (6) LLM.int8(), LoRA, Prefix LM, Sparse transfomer, Sparse attention, Model parallelism, Data parallelism
https://questionet.tistory.com/78
P_7. A Survey of Large Language Models (2/3)
https://wikidocs.net/222913
ml-papers/papers/2023/230523 QLoRA.md
https://github.com/rosinality/ml-papers/blob/main/papers/2023/230523%20QLoRA.md
ml-papers/papers/2023/220815 LLM.int8().md
https://github.com/rosinality/ml-papers/blob/main/papers/2022/220815%20LLM.int8%28%29.md
'News > 논문' 카테고리의 다른 글
RNN Implementation (0) | 2024.06.14 |
---|---|
죽기 전에 이해하고 싶은 논문 (0) | 2024.05.31 |
RLHF (0) | 2024.04.23 |
Text Summarization (0) | 2024.04.17 |