[논문 리뷰]/자연어처리 6

[논문 리뷰] DoRA: Weight-Decomposed Low-Rank Adaptation

논문: https://arxiv.org/abs/2402.09353 저자: Shih-Yang Liu 1 2 Chien-Yi Wang 1 Hongxu Yin 1 Pavlo Molchanov 1 Yu-Chiang Frank Wang 1 Kwang-Ting Cheng 2 Min-Hung Chen 1 (1: NVIDIA, 2: HKUST) 인용: Liu, Shih-Yang, et al. "Dora: Weight-decomposed low-rank adaptation." arXiv preprint arXiv:2402.09353 (2024). 깃허브: https://github.com/nbasyl/DoRA 참고글1: https://discuss.pytorch.kr/t/dora-lora-weight-decomposed..

[논문 리뷰] Seq2Seq: Sequence to Sequence Learning with Neural Networks

논문: https://arxiv.org/abs/1409.3215저자: Ilya Sutskever - ilyasu@google.com, Oriol Vinyals - vinyals@google.com, Quoc V. Le - qvl@google.com인용: Sutskever, I. "Sequence to Sequence Learning with Neural Networks." arXiv preprint arXiv:1409.3215 (2014).   0. 초록 (Abstract) 심층신경망 (Deep Neural Networks, DNNS)은 대규모 라벨 데이터셋이 존재한다면, 여러 테스크에 대해 좋은 성능을 보여주었다. 하지만 문장 번역에 대해서는 그렇지 않다. 저자는 언어 구조에 대한 최소한의 조건으로 문..

[논문 리뷰] LSTM: Long Short-Term Memory

논문: https://ieeexplore.ieee.org/abstract/document/6795963 저자: Sepp Hochreiter(Fakultät für Informatik, Technische Universität München, 80290 München, Germany), Jürgen Schmidhuber(IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland) 인용: S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997, doi: 10.1162/neco.1997.9.8.1735. 참고:1..

[논문 리뷰] Attention: Neural Machine Translation by Jointly Learning to Align and Translate

논문: https://arxiv.org/abs/1409.0473 저자: Dzmitry Bahdanau (Jacobs University Bremen, Germany) KyungHyun Cho and Yoshua Bengio* (Universit´e de Montr´eal)* CIFAR Senior Fellow 인용: Bahdanau, Dzmitry. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).  튜토리얼 코드: https://tutorials.pytorch.kr/intermediate/seq2seq_translation_tutorial.html 데이터..

[논문 리뷰] LORA: Low-Rank Adaptation of Large Language Models

논문: https://arxiv.org/abs/2106.09685저자: Edward Hu* Yelong Shen* Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li SheanWang Lu Wang Weizhu Chen (Microsoft Corporation)*Equal contribution인용: Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021). 깃허브: https://github.com/microsoft/LoRA0. 초록 (Abstract) NLP에 대한 주요 인식은 일반적인 데이터로 대규모 사전학습한 모델을 특정 ta..