한국해양대학교

KMOU Repository 한국해양대학교 대학원 컴퓨터공학과 Thesis

Metadata Downloads

한글 정보 검색을 위한 혼합 n-그램 기반의 색인 방법

Alternative Title: An Indexing Method Based on the Mixed n-Gram for Korean Information Retrieval

URI: http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002176193
http://repository.kmou.ac.kr/handle/2014.oak/10568

Abstract: In Korean information retrieval systems, several indexing methods are proposed such as morpheme-based, word-phrase-based, and n-gram-based. An n-gram-based indexing method is widely used among these methods where n is 2 or 3. The method is very simple, but outperforms others in precision and recall, which are basic measures for evaluating information retrieval systems. On the other hand, the method generates too many index terms that contain meaningless terms, and then the size of index files is huge. To relieve this problem, this paper proposes a new indexing method, which chooses between 2 and 3-grams according to probabilistic criteria for removing the meaningless terms. It is called a mixed n-gram indexing method. The t-score is used for the criteria for choosing between 2 and 3-grams. Also this paper describes a new stemming method for speed-up of Korean indexing systems by using a greedy algorithm.

For experiments, KT-SET and KEMONG-SET are used for reference test collections in Korean and storage and retrieval components of Lemur information retrieval toolkit 2.2 are used. Experiments have shown that the proposed method is not inferior to others in recall and precision, but is superior to others in the number of index terms.

메타데이터 전체 보기

qrcode

OAK

ywm85@kmou.ac.kr Tel: 051-410-4085

KMOU Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.