Enhancements to graph based methods for multi document summarization
|Published in:||Issue 2, (Vol. 3) / 2009|
|Author(s):||Hariharan Shanmugasundaram, Srinivasan Rengaramanujam|
|Abstract.||This paper focuses its attention on extractive summarization using popular graph based approaches. Graph based methods can be broadly classified into two categories: non- PageRank type and PageRank type methods. Of the methods already proposed - the Centrality Degree method belongs to the former category while LexRank and Continuous LexRank methods belong to later category. The paper goes on to suggest two enhancements to both PageRank type and non- PageRank type methods. The first modification is that of recursively discounting the selected sentences, i.e. if a sentence is selected it is removed from further consideration and the next sentence is selected based upon the contributions of the remaining sentences only. Next the paper suggests a method of incorporating position weight to these schemes. In all 14 methods -six of non- PageRank type and eight of PageRank type have been investigated. To clearly distinguish between various schemes, we call the methods of incorporating discounting and position weight enhancements over Lexical Rank schemes as Sentence Rank (SR) methods. Intrinsic evaluation of all the 14 graph based methods were done using conventional Precision metric and metrics earlier proposed by us - Effectiveness1 (E1) and Effectiveness2 (E2). Experimental study brings out that the proposed SR methods are superior to all the other methods.|
|Keywords:||Page Rank, Lexical Rank, Sentence Rank, Recommendation, Degree, Damping, Threshold, Effectiveness, Discounting|
1. Erkan, G., Radev, D 2004, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization”,Journal of Artificial Intelligence Research, Vol. 22,pp. 457-479
2. Shanmugasundaram Hariharan and Rengaramanujam Srinivasan (2008a), “A Comparison of Similarity Measures for Text Documents”, Journal of Information & Knowledge Management, Vol. 7, No. 1, pp. 1–8.
3. Shanmugasundaram Hariharan and Rengarmanujam Srinivasan (2008b), “Investigations in Single document Summarization by Extraction Method”, In Proceedings of IEEE International Conference on Computing, Communication and Networking (ICCCN’08).
4. www.google.com/news,www.rediffnews.com,www.yaho onews.com,www.hindu. com, www.indianexpress.co.in, www.cnn.com.
5. M.F. Porter (1980), “An algorithm for suffix stripping”, Program, 14(3) pp 130-137, 1980.
6. http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_ut ils
7. Shanmugasundaram Hariharan and Rengaramanujam Srinivasan, “Studies on Graph Based Approaches for Single and Multi Document Summarizations”, International Journal of Computer Theory and Engineering, Vol.1, Issue No. 5, December 2009.
8. Luhn, H. P. (1958) ‘The Automatic Creation of Literature Abstracts’, IBM Journal of Research Development, 2(2): 159–165.
9. Edmundson.H.P. (1969) ‘New Methods in Automatic Extracting’, Journal of the ACM, Vol .16, no 2 ,264-285.
10.Bogdan Cranganu Cretu., Zhenmao Chen., Tetsuya Uchimoto. and Kenzo Miya. (2001/2002) ‘Automatic Summarizing based on sentence extraction: A statistical approach’, International Journal of Applied Electromagnetics and Mechanics, IOS Press, Vol 13, pp. 19-23.
11.Dragomir R. Radev, Hongyan Jing, Malgorzata Stys and Daniel Tam (2004), "Centroid-based summarization of multiple documents", Information Processing and Management: Issue 40 , pp. 919-938.
12. Mihalcea, R., Tarau, P 2005, “A language independent algorithm for single and multiple document summarization”. In: Proceedings of IJCNLP 2005.
|Back to the journal content|
This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.