Indexed by:
Abstract:
This paper studies a special case of semi-supervised text categorization. We want to build a text classifier with only a set P of labeled positive documents from one class (called positive class) and a set U of a large number of unlabeled documents from both positive class and other diverse classes (called negative class). This kind of semi-supervised text classification is called positive and unlabeled learning (PU-Learning). Although there are some effective methods for PU-Learning, they do not perform very well when the labeled positive documents are very few. In this paper, we propose a refined method to do the PU-Learning with the known technique combining Rocchio and K-means algorithm. Considering the set P may be very small (<= 5%), not only we extract more reliable negative documents from U but also enlarge the size of P with extracting some most reliable positive documents from U. Our experimental results show that the refined method can perform better when the set P is very small.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
2010 3RD INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2010), VOLS 1-7
ISSN: 1948-2914
Year: 2010
Page: 3075-3079
Language: English
Cited Count:
WoS CC Cited Count: 7
SCOPUS Cited Count: 8
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: