TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구

김태현; 박수현

doi:10.7840/kics.2025.50.5.790

상세 보기

TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구

Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

김태현;
박수현

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

As the utilization of reinforcement learning (RL) in training large language models (LLMs) becomes more prevalent, the necessity to identify optimal RL methodologies tailored for LLMs has emerged. The fields of LLMs andRL are continually evolving through the development of novel techniques that contribute to their mutual advancement. This paper addresses the current trends in reinforcement learning algorithms aimed at enhancing the performanceof large language models.

키워드

LLMs; RLHF

제목: TRPO, PPO, 그리고 DPO를 통한 거대언어모델 강화학습 방법론 동향 연구

제목 (타언어): Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

저자: 김태현; 박수현

DOI: 10.7840/kics.2025.50.5.790

발행일: 2025-05

유형: Article

저널명: 한국통신학회논문지

권: 50

호: 5

페이지: 790 ~ 792

ScholarWorks@숙명여자대학교

상세 보기

초록

키워드