한국해양대학교

Detailed Information

Metadata Downloads

온톨롤지를 적용한 문서 분류에서의 자질추출

Title
온톨롤지를 적용한 문서 분류에서의 자질추출
Author(s)
조희영
Issued Date
2008
Publisher
한국해양대학교 대학원
URI
http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002175515
http://repository.kmou.ac.kr/handle/2014.oak/9753
Abstract
With rapid development of Internet and information service techniques, a huge amount of electronic documents are steadily produced on the Web. The documents like news papers are classified by trained persons without any delay from day to day, but it is a very labor-intensive work and requires a lot of time and cost.

Several studies on automatic document classification have been performed in order to lessen this burden. The studies using techniques of machine learning and natural language processing have shown successful results in the Web ining field. The performance of document classification systems is very much depending on feature sets even though there are also other many factors that can affect the performance.

In this thesis, we propose methods for extracting good ature sets using ontology.

Terms in documents are transformed into terms in ontology in order to reduce the size of feature sets and to compress information of the documents at the expense of some loss of the meaning. This transformation can be performed after or efore general feature selection. We use only relations of synonyms and hypernyms in Korean ontology, U-WIN which has been developed by Ulsan University. We have

experimented with the proposed methods on four classifiers and nine feature selectors in order to objectively evaluate the performance of the proposed methods.

The several experiments have shown that the proposed methods using ontology outperform existing feature selectors over most classifiers except a na?ve Bayesian classifier and also the method applying ontology after eature selection outperforms that before feature selection over every classifiers. We have observed that the performance of feature selectors is very sensitive to classifiers, especially Rocchio classifier. In the future, we will experiment with a large scale of documents of various fields and many languages like English and Japanese to show more objective

results. The ambiguation on multiple hypernyms of a term will be tackled as word sense disambiguation problem.
Appears in Collections:
컴퓨터공학과 > Thesis
Files in This Item:
000002175515.pdf Download

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse