한국해양대학교

Detailed Information

Metadata Downloads

웹 문서와 접근로그의 하이퍼링크 추출을 통한 웹 구조 마이닝

Title
웹 문서와 접근로그의 하이퍼링크 추출을 통한 웹 구조 마이닝
Alternative Title
Web Structure Mining by Extracting Hyperlinks from Web Documents and Access Logs
Author(s)
박철현
Issued Date
2007
Publisher
한국해양대학교 대학원
URI
http://kmou.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002175608
http://repository.kmou.ac.kr/handle/2014.oak/9856
Abstract
It is difficult to predict Web structures for being rapidly changed with frequent updates of documents on the Web. Nevertheless,given the structures, information providers can discover users'behavior patterns and characteristics and supply better services to users, and users can find useful information easily and exactly. This paper proposes an improved method for extracting Web structures.

The method consists of two steps. The first is constructing a directed graph on Web documents as node with their hyperlinks using the depth-first search algorithm. The second is making up for the direct graph by discovering the hyperlinks, which are not extracted in the first step, called hidden hyperlinks. They can be found by analyzing Web access logs, in which click streams are contained. The click streams do not include clicks on 'Back' buttons because of the local cache problem of Web browsers. This causes the problem not finding correct hidden hyperlinks. To cope with the problems, this paper propose an algorithm on searching hidden hyperlinks. We have simulated the discovery of the hidden hyperlinks to evaluate the proposed method experimentally.

Through the simulations, we have observed that the proposed method discovers most hidden hyperlinks appeared on clickstreams.

In the future we should develop some tools for visualizing discovered Web structures and do study on discovering more correct hidden hyperlinks through improving the proposed algorithm.
Appears in Collections:
컴퓨터공학과 > Thesis
Files in This Item:
000002175608.pdf Download

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse