한국해양대학교

KMOU Repository 한국해양대학교 대학원 컴퓨터공학과 Thesis

Detailed Information

Metadata Downloads

형태소 생성을 통한 세종 형태분석 말뭉치의 오류 검출 및 수정 도구 개발

Title: 형태소 생성을 통한 세종 형태분석 말뭉치의 오류 검출 및 수정 도구 개발

Alternative Title: Developing a Tool for Detecting and Correcting Errors in Sejong POS Tagged Corpus

Author(s): 최명길

Issued Date: 2011

Publisher: 한국해양대학교

URI: http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002176395
http://repository.kmou.ac.kr/handle/2014.oak/10796

Abstract: Sejong Corpora are widely used for Korean Language Processing and contains POS(Part of Speech) tagged corpus, word sense tagged corpus, dependency tree tagged corpus, and Korean-English parallel corpus. However, it also contains many kinds of errors although the corpora had been built by well-trained annotators. In this thesis, we specially are interested in the errors which are involved in Sejong POS tagged corpus. The errors cause bad performance of the systems which are trained via the corpus, and should be minimized. It, however, is not easy to detect and correct the errors in the corpus because the proportion of the errors is large and the kinds of the errors are very diverse. Furthermore, detecting and correcting the errors are laborious, time-consuming and then spend large expense.

In this thesis, we propose the error correction tool for efficiently detecting and correcting the errors in the Sejong POS tagged corpus. We automatically detect the errors using the methods for morphological generation and automatic word spacing. The former is used for insertion and deletion errors and spelling errors, and the latter is for word spacing errors. Also we semi-automatically correct the errors using graphical user interface (GUI), which is implemented in Java. The GUI consists of four major functions: the spelling error correction, the morpheme deletion correction, the morpheme insertion correction, and the morphological re-analysis. The GUI is designed to reduce laborious tasks and repetitive behavior patterns.

We have observed that there’s been a nine-fold reduction in the duration for error detection and correction at the least when applying the proposed tool to Sejong POS tagged corpus. We have also shown that error correction speed has steadily increased through experiments. As a result, the proposed tool is very promising for error detection and correction.

Appears in Collections:: 컴퓨터공학과 > Thesis

Files in This Item:: 000002176395.pdf Download

메타데이터 전체 보기

qrcode

트윗하기

OAK

ywm85@kmou.ac.kr Tel: 051-410-4085

KMOU Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.

한국해양대학교

Detailed Information

Browse