KSW: Khmer Stop Word based Dictionary for Keyword Extraction

本文介绍了KSW,一种针对 Khmer 的关键词提取方法,该方法利用了专门的停用词词典。由于 Khmer 语言自然语言处理资源的有限性,有效的关键词提取一直是一个重要的挑战。KSW 通过开发一个定制化的停用词词典并实施一种预处理方法来去除停用词,从而提高了关键词提取的准确性。我们的实验结果表明,与之前的方法相比,KSW 在准确性和相关性方面取得了显著的改进,这表明它有可能推动 Khmer 文本处理和信息检索的发展。KSW 资源,包括停用词词典,可在此处下载:https://github.com/(此链接)。

This paper introduces KSW, a Khmer-specific approach to keyword extraction that leverages a specialized stop word dictionary. Due to the limited availability of natural language processing resources for the Khmer language, effective keyword extraction has been a significant challenge. KSW addresses this by developing a tailored stop word dictionary and implementing a preprocessing methodology to remove stop words, thereby enhancing the extraction of meaningful keywords. Our experiments demonstrate that KSW achieves substantial improvements in accuracy and relevance compared to previous methods, highlighting its potential to advance Khmer text processing and information retrieval. The KSW resources, including the stop word dictionary, are available at the following GitHub repository: (this https URL).

https://arxiv.org/abs/2405.17390

https://arxiv.org/pdf/2405.17390.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注