온톨롤지를 적용한 문서 분류에서의 자질추출
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 조희영 | - |
dc.date.accessioned | 2017-02-22T06:44:40Z | - |
dc.date.available | 2017-02-22T06:44:40Z | - |
dc.date.issued | 2008 | - |
dc.date.submitted | 56877-07-05 | - |
dc.identifier.uri | http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002175515 | ko_KR |
dc.identifier.uri | http://repository.kmou.ac.kr/handle/2014.oak/9753 | - |
dc.description.abstract | With rapid development of Internet and information service techniques, a huge amount of electronic documents are steadily produced on the Web. The documents like news papers are classified by trained persons without any delay from day to day, but it is a very labor-intensive work and requires a lot of time and cost. Several studies on automatic document classification have been performed in order to lessen this burden. The studies using techniques of machine learning and natural language processing have shown successful results in the Web ining field. The performance of document classification systems is very much depending on feature sets even though there are also other many factors that can affect the performance. In this thesis, we propose methods for extracting good ature sets using ontology. Terms in documents are transformed into terms in ontology in order to reduce the size of feature sets and to compress information of the documents at the expense of some loss of the meaning. This transformation can be performed after or efore general feature selection. We use only relations of synonyms and hypernyms in Korean ontology, U-WIN which has been developed by Ulsan University. We have experimented with the proposed methods on four classifiers and nine feature selectors in order to objectively evaluate the performance of the proposed methods. The several experiments have shown that the proposed methods using ontology outperform existing feature selectors over most classifiers except a na?ve Bayesian classifier and also the method applying ontology after eature selection outperforms that before feature selection over every classifiers. We have observed that the performance of feature selectors is very sensitive to classifiers, especially Rocchio classifier. In the future, we will experiment with a large scale of documents of various fields and many languages like English and Japanese to show more objective results. The ambiguation on multiple hypernyms of a term will be tackled as word sense disambiguation problem. | - |
dc.description.tableofcontents | 목 차 표 목차 ⅱ 그림 목차 ⅲ Abstract ⅳ 제 1 장 1 제 2 장 관련 3 2.1 문서표현 3 2.2 자질선택 6 2.3 문서분류 8 2.4 온톨로지 11 제 3 장 온톨로지를 적용한 자질추출 15 3.1 전처리 및 자질생성 17 3.2 온톨로지 적용 18 3.3 자질선택 21 3.4 벡터표현 21 3.5 문서분류 22 제 4 장 실험 및 평가 24 4.1 실험 환경 24 4.2 평가 방법 24 4.3 성능 평가 및 분석 25 제 5 장 결론 38 참고문헌 39 | - |
dc.language | kor | - |
dc.publisher | 한국해양대학교 대학원 | - |
dc.title | 온톨롤지를 적용한 문서 분류에서의 자질추출 | - |
dc.type | Thesis | - |
dc.date.awarded | 2008-02 | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.