With rapidly developing Internet applications, an e-mail has been considered as one of the most popular methods for exchanging information because of easy usage and low cost. The e-mail, however, has a serious problem that users can receive a lot of unwanted e-mails, what we called, SPAM mails, and then the user's mailbox can be grown exponentially. The users need for spending time to pick out the SPAM mails and give a great loss economically. To alleviate the problem, many researchers and companies proposed some filtering technologies.
On the other hand, in e-mail client systems, users do different actions according to usefulness of information on mails, and some classification and recommendation systems like GroupLens use the actions to improve the performance. This paper presents a mail filtering system using user actions and incremental machine learning. E-mail data and user actions are collected through some user interface implemented in CGI/Perl. Our proposed system makes use of two models: One is anaction inference model to draw a user action from an e-mail and the other is a mail classification model to decide if an e-mail is SPAM or not. All the two models are derived using incremental learning, of which an algorithm is IB2 of TiMBL.
To evaluate our proposed system, we collect 10,000 mails of 12 persons from Hanmail (www.hanmail.net), which is one of the most popular e-mail service providers in Korea. The accuracy is 81 ~ 93% according to each person. Our proposed system outperforms a system that does not use any information about user actions. Consequently, we have shown that information about user actions is useful for e-mail filtering