Paper title: Cluster Analysis of Customer Reviews Extracted from Web Pages
Published in: Issue 3, (Vol. 4) / 2010Download
Publishing date: 2010-10-26
Pages: 56-62
Author(s): HIREMATH P.S. , ALGUR Siddu P., SHIVASHANKAR S.
Abstract. As e-commerce is gaining popularity day by day, the web has become an excellent source for gathering customer reviews / opinions by the market researchers. The number of customer reviews that a product receives is growing at very fast rate (It could be in hundreds or thousands). Customer reviews posted on the websites vary greatly in quality. The potential customer has to read necessarily all the reviews irrespective of their quality to make a decision on whether to purchase the product or not. In this paper, we make an attempt to assess a review based on its quality, to help the customer make a proper buying decision. The quality of customer review is assessed as most significant, more significant, significant and insignificant. A novel and effective web mining technique is proposed for assessing a customer review of a particular product based on the feature clustering techniques, namely, k-means method and fuzzy c-means method. This is performed in three steps : (1) Identify review regions and extract reviews from it, (2) Extract and cluster the features of reviews by a clustering technique and then assign weights to the features belonging to each of the clusters (groups) and (3) Assess the review by considering the feature weights and group belongingness. The k-means and fuzzy c-means clustering techniques are implemented and tested on customer reviews extracted from web pages. Performance of these techniques are analyzed.
Keywords: Customer Reviews, Quartile Measure, Summarization, Feature Extraction, Feature Weight, Web Mining, Clustering Technique
References:

1. K. Church, and P. Hanks. "Word association norms, mutual information and lexicography." Computational Linguistics 16, no. 1 (1990): 22-29.

2. Daille. "study and Implementation of Combined Techniques for Automatic Extraction of terminology." In The Balancing Act:Combing Symbolic and Statistical Approaches to Language Processing. Cambridge: MIT Press.

3. Dave, K, S lawrence, and D Pennock. "Minig the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews." WWW-2003. 2003.

4. G. Dejong, "An Overview of the FRUMP System." In Stratregies for Natural language Parsing, 149-176. 1982.

5. E. Hovy, and C Y Lin. "Automated Text Summarization in SUMMARIST." ACL Workshop on Intelligent, Scalable Text Summarization. 1997.

6. Hu, Minqing, and Bing Liu. "Mining and summarizing customer reviews." KDD. 2004.

7. Hu, Minqing, and Bing Liu. "Mining opinion features in customer reviews." AAAI. 2004.

8. C. Jacquemin, and D. Bourigault. "Team extracting and automatic indexing." In Handbook of Computational Linguistics, edited by R Mitkov. Oxford University Press, 2001.

9. J. S. Justeson, and S. M. Katz. "Technical Terminology: some linguistic properties and an alogorithm for identification in text." Natural Language Engineering 1, no. 1 (1995): 9-27.

10. M. kan, and K. McKeown. "Information Extraction and Summerization:Domain Independence through Focus Types." Columbia University Technical Report CUCS-030-99, 1999.

11. Kim, Soo-Min, Patrick Pantel, Tim Chklovski, and Macro Pennacchiotti. "Automatically Assessing Review Helpfulness." EMNLP-2006. 2006.

12. J. Kupiec, J. Pedersen, and F. Chen. "A Trainable Document Summerizer." SIGIR-1995. 1995.

Back to the journal content
Creative Commons License
This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Home | Editorial Board | Author info | Archive | Contact
Copyright JACSM 2007-2020