사례기반 학습을 이용한 한국어 어절 분류
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 朴浩珍 | - |
dc.date.accessioned | 2017-02-22T06:17:00Z | - |
dc.date.available | 2017-02-22T06:17:00Z | - |
dc.date.issued | 2002 | - |
dc.date.submitted | 56797-10-27 | - |
dc.identifier.uri | http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002173883 | ko_KR |
dc.identifier.uri | http://repository.kmou.ac.kr/handle/2014.oak/9270 | - |
dc.description.abstract | Generally, Internet users have exploited search engines to find the information that they need. Such search engines require fast processing and particularly morphological analysis in Korean. The notorious problem in Korean morphological analysis is over-generation, which is caused by the lack of morphotactics. This paper describes the eojeol classification in order to lighten the burden of the over-generation. In other word, we want to reduce the search space for morphological analysis using eojeol categories. In this paper, we propose a method for eojeol classification using an instance-based learning technique. To evaluate our proposed system, we use two test corpora (KAIST and ETRI) that are part-of-speech tagged in Korean. In addition, we use the cross validation method for training and evaluation since the test corpora are not enough. The average accuracies of the test corpora are 97% and 96.6% under 22 features, respectively, but the average accuracy is reduced into 95.5% even though the two corpora are combined. We believe that the tragedy results from the inconsistent tagging method in spite of the larger amount of training data. To select optimal features for our system, we employ backward sequential selection. As a result, we choose 16 features as the optimal features and the performance of our system is improved by about 0.2%. Furthermore the reduction rate is 35% on average when our system is applied to Korean morphological analysis. | - |
dc.description.tableofcontents | 목차 1제장 서론 = 1 2제장 관련 연구 = 3 2.1 분류 = 3 2.2 사례기반 학습 = 3 2.3 결정트리 = 5 2.4 변형기반 학습 = 6 3제장 한국어 어절범주 = 8 3.1 한국어 어절 = 8 3.2 한국어 어절범주 = 9 3.2.1 체언절 = 10 3.2.2 용언절 = 11 3.2.3 수식언절 = 12 3.2.4 감탄사 = 12 3.2.5 기호 = 13 4제장 한국어 어절 분류 시스템 = 14 4.1 시스템 구성 = 14 4.2 학습단계 = 15 4.2.1 전처리기 = 16 4.2.2 어절범주 부착기 = 18 4.2.3 자질 추출기 = 19 4.2.4 사례기반 학습 = 22 4.3 실행단계 = 24 4.3.1 자질 추출기 및 어절 분류기 = 25 4.3.2 후처리기 = 25 5제장 실험 및 평가 = 27 5.1 실험 말뭉치 = 27 5.2 성능 평가 방법 = 27 5.3 어절 분류기 성능 = 28 5.4 자질 최적화 = 29 5.5 최적 성능 = 31 5.6 오류 분석 = 32 5.7 형태소 분석 축소율 = 34 6제장 결과 및 향후 연구방향 = 36 참고 문헌 = 38 | - |
dc.publisher | 韓國海洋大學校 | - |
dc.title | 사례기반 학습을 이용한 한국어 어절 분류 | - |
dc.title.alternative | Korean Eojeol Classification Using Instance-based Learning | - |
dc.type | Thesis | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.