Indexed by:
Abstract:
In order to improve consumers' efficiency of making purchasing decisions, this paper proposed a product attribute extraction method based on seed-constrained-LDA (Latent Dirichlet Allocation), which classify and extract product attributes from reviews, so that reviews can be displayed according to different product attributes. Specifically, the extraction method used the term frequency-inverse document frequency (TF-IDF) algorithm to automatically extract keywords as an attribute seed set. Then, it reorganized the document twice. The twice-reorganized document only describes one product attribute, so the multi-attribute co-occurrence problem of long text and the sparsity problem of short text can be solved, and the reorganization rate of document can be improved. Next, the must-link and cannot-link seed constraints were applied to define the probability expansion and contraction value, which affects the topic allocation of the LDA model and makes the training results more reasonable by constraints on the Gibbs sampling process. Finally, the topics generated by the seed-constraint-LDA were mapped to the prior attribute categories. The results of qualitative analysis (attribute categories, attribute words) and quantitative analysis (accuracy rate, entropy value, purity) show that the accuracy and purity of the proposed method are higher than the existing comparison methods, and the entropy value is lower than that of the existing comparison methods, indicating that this method has better clustering effect. © 2022, Editorial Department, Journal of South China University of Technology. All right reserved.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
Journal of South China University of Technology (Natural Science)
ISSN: 1000-565X
CN: 44-1251/T
Year: 2022
Issue: 6
Volume: 50
Page: 37-48 and 70
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: