상세 보기
초록
This study quantitatively evaluates the multilingual writing abilities of four large language models-Claude 3.5, Gemini 2.5, GPT-4.1, and Mistral 3.1-by analyzing Korean, English, and Chinese texts generated under controlled conditions using multiple lexical diversity indices, including MATTR, HDD, VOCD, MTLD, and a revised TTR. Analysis of 7,200 generated texts reveals three significant findings: (1) lexical diversity differed significantly across models, with Gemini 2.5 and GPT-4.1 showing the highest variation, whereas Mistral 3.1 consistently exhibited the lowest; (2) clear language-dependent patterns emerged, as English demonstrated the most extraordinary lexical diversity, while Korean and Chinese showed more constrained variation due to structural and morphological properties; and (3) although temperature adjustments increased variation in specific models, their influence was modest compared to language type and model architecture. These results indicate that the multilingual writing performance of LLMs arises from the interaction of model-specific design and language- specific characteristics, suggesting that single-metric or single- language evaluations are insufficient for a comprehensive assessment. By providing a large-scale empirical comparison across three languages, this study contributes a refined analytical framework for evaluating the lexical behavior of LLMs and offers implications for the development of multilingual writing support systems and future LLM evaluation methodologies.
키워드
- 제목
- 초거대언어모델의 한·영·중 논술형 글쓰기 능력 평가 -어휘 다양성 비교 중심으로-
- 제목 (타언어)
- Multilingual LLMs’Essay Writing Evaluation Across Korean, English, and Chinese: A Comparative Analysis of Lexical Diversity
- 저자
- 비립
- 발행일
- 2025-12
- 유형
- Y
- 저널명
- 텍스트언어학
- 권
- 59
- 페이지
- 155 ~ 188