Paper title:

Arabic Text Categorization Using Improved k-Nearest neighbour Algorithm

Published in: Issue 3, (Vol. 8) / 2014
Publishing date: 2014-10-30
Pages: 9-12
Author(s): KHALED Wail Hamood , AL-SARRAYRIH Haytham Saleem , KNIPPING Lars
Abstract. Abstract–The quantity of text information published in Arabic language on the net requires the implementation of effective techniques for the extraction and classifying of relevant information contained in large corpus of texts. In this paper we presented an implementation of an enhanced k-NN Arabic text classifier. We apply the traditional k-NN and Naive Bayes from Weka Toolkit for comparison purpose. Our proposed modified k-NN algorithm features an improved decision rule to skip the classes that are less similar and identify the right class from k nearest neighbours which increases the accuracy. The study evaluates the improved decision rule technique using the standard of recall, precision and f-measure as the basis of comparison. We concluded that the effectiveness of the proposed classifier is promising and outperforms the classical k-NN classifier.
Keywords: Improved K-NN, K-Nearest Neighbours, KNN, Text Classification.
References:

1. Al-Harbi S., Almuhareb A., Al-Thubaity A., Khorsheed M., Al-Rajeh A., “Automatic Arabic Text Classification,” ( JADT'2008 en ligne), pp 77-83, 2008.

2. Al-Shalabi R., Obeidat R., “Improving KNN Arabic Text Classification with N-Grams Based Document Indexing,” In (INFOS2008): Proceedings of the Sixth International Conference on Informatics and Systems, 2008.

3. Suguna N., Thanushkodi K., “An Improved k-Nearest Neighbor Classification Using Genetic Algorithm,” International Journal of Computer Science Issues, Vol. 7, No 2, pp 18-21, 2010.

4. Miah M., “Improved k-NN Algorithm for Text Classification,” International conference on Data Mining (DMIN), 2009.

5. Sebastiani F., “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34, No 1, pp 1–47, 2002.

6. Bawaneh M., Alkoffash M., Al Rabea A., “Arabic Text Classification using K-NN and Naive Bayes,” Journal of Computer Science, Vol. 4, No 7, pp 600-605, 2008.

7. Ikonomakis M., Kotsiantis S., Tampakas V., “Text Classification Using Machine Learning Techniques,” WSEAS Transactions on Computers, Vol 4, pp 966-974, 2005.

8. Schütze H., Velipasaoglu E. Pedersen J., “Performance Thresholding in Practical Text Classification,” In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management, pp 662–671, 2006.

9. Hadi W., Salam M., Al-Widian., “Performance of NB and SVM Classifiers in Islamic Arabic Data,” In ISWSA '10: Proceedings of the ACM 1stInternational Conference on Intelligent Semantic Web-Services and Applications, 2010.

10. Thabtah F., Eljinini M., Zamzeer M., Hadi W., “Naive Bayesian Based on Chi Square to Categorize Arabic Data,” In (IBIMA) :proceedings of The 11th International Business Information Management Association Conference on Innovation and Knowledge Management in Twin Track Economies, Vol. 10, pp. 930-935, 2009.

11. Jbara K., “Knowledge Discovery in Al-Hadith Using Text Classification Algorithm,” Journal of American Science, Vol 6, No 11, pp 409-419, 2010.

12. Duwairi R., “Arabic text categorization,” The international Arab Journal of Information Technology, Vol.4, No 2, pp 125- 132, 2007.

13. Baoli L., Shiwen Y., “An Adaptive k-Nearest Neighbor Text Categorization Strategy,” Journal ACM Transactions on Asian Language Information Processing (TALIP), Vol. 3, No 4, pp 215-226, 2004.

14. Han E., Karypis G., Kumar V., “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” In PAKDD '01:Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 53-65, 2001.

15. El Kourdi M., Bensaid A., Rachidi T., “Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm,” In Semitic '04: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, pp 51-58, 2004.

16. Barakat A., “Two-Sample Multivariate Test of Homogeneity Using Weighted Nearest Neighbours,” Mater thesis, Science - Computational Mathematics, An- Najah National University, 2012.

17. Singh S., Murthy H., Gonsalves T., “Feature Selection for Text Classification Based on Gini Coefficient of Inequality,” Workshop and Conference Proceedings, The Fourth Workshop on Feature Selection in Data Mining, Vol. 10, pp 76-85, 2010.

18. Mesleh A., “Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System,” Journal of Computer Science, Vol.3, No 6, pp 430-435, 2007.

Back to the journal content
Creative Commons License
This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Home | Editorial Board | Author info | Archive | Contact
Copyright JACSM 2007-2024