한국해양대학교

Detailed Information

Metadata Downloads

형태소 생성을 통한 세종 형태분석 말뭉치의 오류 검출 및 수정 도구 개발

Title
형태소 생성을 통한 세종 형태분석 말뭉치의 오류 검출 및 수정 도구 개발
Alternative Title
Developing a Tool for Detecting and Correcting Errors in Sejong POS Tagged Corpus
Author(s)
최명길
Issued Date
2011
Publisher
한국해양대학교
URI
http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002176395
http://repository.kmou.ac.kr/handle/2014.oak/10796
Abstract
Sejong Corpora are widely used for Korean Language Processing and contains POS(Part of Speech) tagged corpus, word sense tagged corpus, dependency tree tagged corpus, and Korean-English parallel corpus. However, it also contains many kinds of errors although the corpora had been built by well-trained annotators. In this thesis, we specially are interested in the errors which are involved in Sejong POS tagged corpus. The errors cause bad performance of the systems which are trained via the corpus, and should be minimized. It, however, is not easy to detect and correct the errors in the corpus because the proportion of the errors is large and the kinds of the errors are very diverse. Furthermore, detecting and correcting the errors are laborious, time-consuming and then spend large expense.

In this thesis, we propose the error correction tool for efficiently detecting and correcting the errors in the Sejong POS tagged corpus. We automatically detect the errors using the methods for morphological generation and automatic word spacing. The former is used for insertion and deletion errors and spelling errors, and the latter is for word spacing errors. Also we semi-automatically correct the errors using graphical user interface (GUI), which is implemented in Java. The GUI consists of four major functions: the spelling error correction, the morpheme deletion correction, the morpheme insertion correction, and the morphological re-analysis. The GUI is designed to reduce laborious tasks and repetitive behavior patterns.

We have observed that there’s been a nine-fold reduction in the duration for error detection and correction at the least when applying the proposed tool to Sejong POS tagged corpus. We have also shown that error correction speed has steadily increased through experiments. As a result, the proposed tool is very promising for error detection and correction.
Appears in Collections:
컴퓨터공학과 > Thesis
Files in This Item:
000002176395.pdf Download

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse