한국해양대학교

KMOU Repository 한국해양대학교 대학원 컴퓨터공학과 Thesis

Detailed Information

Metadata Downloads

온톨롤지를 적용한 문서 분류에서의 자질추출

Title: 온톨롤지를 적용한 문서 분류에서의 자질추출

Author(s): 조희영

Issued Date: 2008

Publisher: 한국해양대학교 대학원

URI: http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002175515
http://repository.kmou.ac.kr/handle/2014.oak/9753

Abstract: With rapid development of Internet and information service techniques, a huge amount of electronic documents are steadily produced on the Web. The documents like news papers are classified by trained persons without any delay from day to day, but it is a very labor-intensive work and requires a lot of time and cost.

Several studies on automatic document classification have been performed in order to lessen this burden. The studies using techniques of machine learning and natural language processing have shown successful results in the Web ining field. The performance of document classification systems is very much depending on feature sets even though there are also other many factors that can affect the performance.

In this thesis, we propose methods for extracting good ature sets using ontology.

Terms in documents are transformed into terms in ontology in order to reduce the size of feature sets and to compress information of the documents at the expense of some loss of the meaning. This transformation can be performed after or efore general feature selection. We use only relations of synonyms and hypernyms in Korean ontology, U-WIN which has been developed by Ulsan University. We have

experimented with the proposed methods on four classifiers and nine feature selectors in order to objectively evaluate the performance of the proposed methods.

The several experiments have shown that the proposed methods using ontology outperform existing feature selectors over most classifiers except a na？ve Bayesian classifier and also the method applying ontology after eature selection outperforms that before feature selection over every classifiers. We have observed that the performance of feature selectors is very sensitive to classifiers, especially Rocchio classifier. In the future, we will experiment with a large scale of documents of various fields and many languages like English and Japanese to show more objective

results. The ambiguation on multiple hypernyms of a term will be tackled as word sense disambiguation problem.

Appears in Collections:: 컴퓨터공학과 > Thesis

Files in This Item:: 000002175515.pdf Download

메타데이터 전체 보기

qrcode

트윗하기

OAK

ywm85@kmou.ac.kr Tel: 051-410-4085

KMOU Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.

한국해양대학교

Detailed Information

Browse