Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in April 2026 (Volume 30, Issue 1) Submit manuscript

Korean Subword vocabulary optimization by removing compositional words in neural machine translation

Breadcrumb

  • Home
  • Korean Subword vocabulary optimization by removing compositional words in neural machine translation

Kim Ryonghyok 1, *, Kim Kwanghyok 1, An Songil 2, Ryang Cholho 1 and Choe Jinhyok 1

1 Department of Artificial Intelligence, Artificial Intelligence Technology Institute, Kim Il Sung University, Pyongyang, Democratic People’s Republic of Korea.

2 Department of Foreign Language, Kim Il Sung University, Pyongyang, Democratic People’s Republic of Korea.

Research Article

World Journal of Advanced Research and Reviews, 2026, 29(03), 1008-1015

Article DOI: 10.30574/wjarr.2026.29.3.0477

DOI url: https://doi.org/10.30574/wjarr.2026.29.3.0477

Received on 17 January 2026; revised on 25 February 2026; accepted on 27 February 2026

Byte Pair Encoding (BPE) is widely recognized as an effective approach for machine translation across multiple languages. However, in morphologically rich languages such as Korean, BPE can lead to excessive segmentation, which harms word semantics and creates semantic confusion during the training. This semantic confusion ultimately leads to an overall degradation in translation quality. Subword segmentation is an effective solution to the vocabulary problem in neural machine translation. This paper proposes a method to optimize the Korean subword vocabulary for neural machine translation, based on the fact that a Korean subword vocabulary created with the BPE training algorithm contains many compositional subwords. The optimized Korean subword vocabulary demonstrates experimentally stabilized translation performance by maintaining a balanced distribution while removing unnecessary compositional subwords.

Korean Translation; NMT; Subword Vocabulary; BPE Learning Algorithm; Vocabulary Optimization

https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2026-0477.pdf

Preview Article PDF

Kim Ryonghyok, Kim Kwanghyok, An Songil, Ryang Cholho and Choe Jinhyok. Korean Subword vocabulary optimization by removing compositional words in neural machine translation. World Journal of Advanced Research and Reviews, 2026, 29(03), 1008-1015. Article DOI: https://doi.org/10.30574/wjarr.2026.29.3.0477.

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution