TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구
Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

As the utilization of reinforcement learning (RL) in training large language models (LLMs) becomes more prevalent, the necessity to identify optimal RL methodologies tailored for LLMs has emerged. The fields of LLMs andRL are continually evolving through the development of novel techniques that contribute to their mutual advancement. This paper addresses the current trends in reinforcement learning algorithms aimed at enhancing the performanceof large language models.

키워드

LLMsRLHF
제목
TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구
제목 (타언어)
Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO
저자
김태현박수현
DOI
10.7840/kics.2025.50.5.790
발행일
2025-05
유형
Article
저널명
한국통신학회논문지
50
5
페이지
790 ~ 792