한국해양대학교

Detailed Information

Metadata Downloads

한글 정보 검색을 위한 혼합 n-그램 기반의 색인 방법

DC Field Value Language
dc.contributor.author 정창용 -
dc.date.accessioned 2017-02-22T07:16:55Z -
dc.date.available 2017-02-22T07:16:55Z -
dc.date.issued 2004 -
dc.date.submitted 56823-11-10 -
dc.identifier.uri http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002176193 ko_KR
dc.identifier.uri http://repository.kmou.ac.kr/handle/2014.oak/10568 -
dc.description.abstract In Korean information retrieval systems, several indexing methods are proposed such as morpheme-based, word-phrase-based, and n-gram-based. An n-gram-based indexing method is widely used among these methods where n is 2 or 3. The method is very simple, but outperforms others in precision and recall, which are basic measures for evaluating information retrieval systems. On the other hand, the method generates too many index terms that contain meaningless terms, and then the size of index files is huge. To relieve this problem, this paper proposes a new indexing method, which chooses between 2 and 3-grams according to probabilistic criteria for removing the meaningless terms. It is called a mixed n-gram indexing method. The t-score is used for the criteria for choosing between 2 and 3-grams. Also this paper describes a new stemming method for speed-up of Korean indexing systems by using a greedy algorithm. For experiments, KT-SET and KEMONG-SET are used for reference test collections in Korean and storage and retrieval components of Lemur information retrieval toolkit 2.2 are used. Experiments have shown that the proposed method is not inferior to others in recall and precision, but is superior to others in the number of index terms. -
dc.description.tableofcontents 제 1 장 서 론 1 제 2 장 정보 검색 시스템과 한글 문서 색인 방법 4 2.1 정보 검색 시스템 4 2.2 한글 문서 색인 방법 8 제 3 장 한글 문서를 위한 혼합 n-그램 색인 방법 12 3.1 동일 어근 추출 방법 12 3.2 혼합 n-그램을 이용한 색인 방법 18 제 4 장 실험 및 평가 22 4.1 실험 환경 22 4.2 평가 방법 23 4.3 성능 평가 24 4.4 토의 34 제 5 장 결 론 36 참고 문헌 38 -
dc.language kor -
dc.publisher 한국해양대학교 대학원 -
dc.title 한글 정보 검색을 위한 혼합 n-그램 기반의 색인 방법 -
dc.title.alternative An Indexing Method Based on the Mixed n-Gram for Korean Information Retrieval -
dc.type Thesis -
dc.date.awarded 2004-08 -
dc.contributor.alternativeName Chang-yong Jung -
Appears in Collections:
컴퓨터공학과 > Thesis
Files in This Item:
000002176193.pdf Download

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse