한국해양대학교

KMOU Repository 한국해양대학교 대학원 컴퓨터공학과 Thesis

Metadata Downloads

웹 문서와 접근로그의 하이퍼링크 추출을 통한 웹 구조 마이닝

Alternative Title: Web Structure Mining by Extracting Hyperlinks from Web Documents and Access Logs

URI: http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002175608
http://repository.kmou.ac.kr/handle/2014.oak/9856

Abstract: It is difficult to predict Web structures for being rapidly changed with frequent updates of documents on the Web. Nevertheless,given the structures, information providers can discover users'behavior patterns and characteristics and supply better services to users, and users can find useful information easily and exactly. This paper proposes an improved method for extracting Web structures.

The method consists of two steps. The first is constructing a directed graph on Web documents as node with their hyperlinks using the depth-first search algorithm. The second is making up for the direct graph by discovering the hyperlinks, which are not extracted in the first step, called hidden hyperlinks. They can be found by analyzing Web access logs, in which click streams are contained. The click streams do not include clicks on 'Back' buttons because of the local cache problem of Web browsers. This causes the problem not finding correct hidden hyperlinks. To cope with the problems, this paper propose an algorithm on searching hidden hyperlinks. We have simulated the discovery of the hidden hyperlinks to evaluate the proposed method experimentally.

Through the simulations, we have observed that the proposed method discovers most hidden hyperlinks appeared on clickstreams.

In the future we should develop some tools for visualizing discovered Web structures and do study on discovering more correct hidden hyperlinks through improving the proposed algorithm.

메타데이터 전체 보기

qrcode

OAK

ywm85@kmou.ac.kr Tel: 051-410-4085

KMOU Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.