상세 보기
TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구
Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO
- 김태현;
- 박수현
Citations
WEB OF SCIENCE
0Citations
SCOPUS
0초록
As the utilization of reinforcement learning (RL) in training large language models (LLMs) becomes more prevalent, the necessity to identify optimal RL methodologies tailored for LLMs has emerged. The fields of LLMs andRL are continually evolving through the development of novel techniques that contribute to their mutual advancement. This paper addresses the current trends in reinforcement learning algorithms aimed at enhancing the performanceof large language models.
키워드
LLMs; RLHF
- 제목
- TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구
- 제목 (타언어)
- Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO
- 저자
- 김태현; 박수현
- 발행일
- 2025-05
- 유형
- Article
- 저널명
- 한국통신학회논문지
- 권
- 50
- 호
- 5
- 페이지
- 790 ~ 792