- 사례기반 학습을 이용한 한국어 어절 분류
- Alternative Title
- Korean Eojeol Classification Using Instance-based Learning
- Publication Year
- Generally, Internet users have exploited search engines to find the information that they need. Such search engines require fast processing and particularly morphological analysis in Korean. The notorious problem in Korean morphological analysis is over-generation, which is caused by the lack of morphotactics. This paper describes the eojeol classification in order to lighten the burden of the over-generation. In other word, we want to reduce the search space for morphological analysis using eojeol categories. In this paper, we propose a method for eojeol classification using an instance-based learning technique.
To evaluate our proposed system, we use two test corpora (KAIST and ETRI) that are part-of-speech tagged in Korean. In addition, we use the cross validation method for training and evaluation since the test corpora are not enough. The average accuracies of the test corpora are 97% and 96.6% under 22 features, respectively, but the average accuracy is reduced into 95.5% even though the two corpora are combined. We believe that the tragedy results from the inconsistent tagging method in spite of the larger amount of training data. To select optimal features for our system, we employ backward sequential selection. As a result, we choose 16 features as the optimal features and the performance of our system is improved by about 0.2%. Furthermore the reduction rate is 35% on average when our system is applied to Korean morphological analysis.
Appears in Collections:
- 컴퓨터공학과 > Thesis
- Files in This Item:
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.