한국해양대학교

Detailed Information

Metadata Downloads

사례기반 학습을 이용한 한국어 어절 분류

Title
사례기반 학습을 이용한 한국어 어절 분류
Alternative Title
Korean Eojeol Classification Using Instance-based Learning
Author(s)
朴浩珍
Issued Date
2002
Publisher
韓國海洋大學校
URI
http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002173883
http://repository.kmou.ac.kr/handle/2014.oak/9270
Abstract
Generally, Internet users have exploited search engines to find the information that they need. Such search engines require fast processing and particularly morphological analysis in Korean. The notorious problem in Korean morphological analysis is over-generation, which is caused by the lack of morphotactics. This paper describes the eojeol classification in order to lighten the burden of the over-generation. In other word, we want to reduce the search space for morphological analysis using eojeol categories. In this paper, we propose a method for eojeol classification using an instance-based learning technique.


To evaluate our proposed system, we use two test corpora (KAIST and ETRI) that are part-of-speech tagged in Korean. In addition, we use the cross validation method for training and evaluation since the test corpora are not enough. The average accuracies of the test corpora are 97% and 96.6% under 22 features, respectively, but the average accuracy is reduced into 95.5% even though the two corpora are combined. We believe that the tragedy results from the inconsistent tagging method in spite of the larger amount of training data. To select optimal features for our system, we employ backward sequential selection. As a result, we choose 16 features as the optimal features and the performance of our system is improved by about 0.2%. Furthermore the reduction rate is 35% on average when our system is applied to Korean morphological analysis.
Appears in Collections:
컴퓨터공학과 > Thesis
Files in This Item:
000002173883.pdf Download

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse