Paper title:

An Evaluation of Big Data Reduction Approaches

Published in: Issue 2, (Vol. 15) / 2021
Publishing date: 2021-11-14
Pages: 20-27
Author(s): KAREEM Shahab, HAMAKARIM Rebeen
Abstract. When data is massive, data reduction is an essential move that helps to reduce the computational intractability of learning techniques. This is especially true for the massive datasets that have become popular in recent years. The key issue that both data preprocessors and learning techniques are facing is that data is growing in both dimensionality and the number of data instances. Big data analytics research entering new stage is known and the data is fast, in which several gigabytes of data reach in Bigdata systems per second. Due to the length, velocity, meaning, range, uncertainty, and veracity of the acquired data, modern big data systems capture inherently complex data sources, giving Big Data rises to 6Vs. The collection of reduced and correct data streams is more valuable than the aggregation raw, noisy data, unreliable, and repetitive. Another viewpoint on Bigdata reduction is that large datasets of millions of variables suffer from the dimensionality curse, which necessitates unlimited computing resources to discover practical information trends. This review provides an overview of strategies for reducing great amounts of data. In addition to taxonomic analysis for big data reduction, big data complexity, and big data collection.
Keywords: Reduction, Data Preprocessing, Analysis, Bigdata.

1. I. F. Cruz and H. Xiao, ‘‘Ontology driven data integration in heterogeneous networks,’’ in Complex Systems in Knowledge-Based Environments: Theory, Models and Applications. Berlin, Germany: Springer-Verlag, 2009, pp. 75–98.

2. S. W. KAREEM, "Secure Cloud Approach Based on Okamoto-Uchiyama Cryptosystem," Journal of Applied Computer Science & Mathematics, vol. 14, no. 1, pp. 9-13, 2020.

3. Shadan Mohammed Jihad ABDALWAHID, Raghad Zuhair YOUSIF, Shahab Wahhab KAREEM, "ENHANCING APPROACH USING HYBRID PAILLER AND RSA FOR INFORMATION SECURITY IN BIGDATA," Applied Computer Science, vol. 15, no. 4, pp. 63-74, 2019.

4. Shahab Wahhab Kareem, Yahya Tareq Hussein, "Survey and New Security methodology of Routing Protocol in AD-Hoc Network," in QALAAI ZANIST JOURNAL, Erbil, 2017.

5. H. Abdi. L. J. Williams, "Principal component analysis," Wiley Interdiscipl Rev., Comput. Statist., vol. 2, no. 4, pp. 433–459, Jul./Aug. 2010.

6. M. Brand,"Incremental singular value decomposition of uncertain data with," in Proc. 7th Eur. Conf. Comput. Vis. (ECCV), 2002, pp. 707–720.

7. Jimeng Sun, Dacheng Tao and Christos Faloutsos "Beyond streams and graphs: Dynamic tensor," in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 374–383.

8. Alberto Fernandez,Cristobal Jose Carmona, Maria Jose del Jesus,Francisco Herrera, "Data mining with big data," IEEE Trans Knowl Data Eng , 2014.

9. Jinchuan Chen, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao & Xuan Zhou, " Big data challenge: a data management perspective," Front comput Sci 7(2) :157-167, 2013.

10. Lee. X. Chen X-W, " Big data deep learning: challenges and perspectives," IEEE Access 2:514–525, 2014.

11. Y Zhai, YS Ong, and IW Tsang "The emerging big dimensionality," Comput Intel Mag IEEE 9(3):14–26, 2014.

12. Zhen Chen,Yuhao Wen, Junwei Cao, Wenxun Zheng, Jiahui Chang, Yinjun Wu,Ge Ma, Mourad Hakmaoui, and Guodong Peng, " A survey of bitmap index compression algorithms for big," Tsinghua Sci Technol 20(1):100–115, 2015.

13. Hongbo Zou,Yongen Yu, Wei Tang, Hsuanwei Michelle Chen, "Flexanalytics: a flexible data analytics framework for big," Big Data Res 1:4–13, 2014.

14. A. S. Ackermann K, " A resource efficient big data analysis method," Procedia Comput Sci 29:2360–2369, 2014.

15. ChiYang, XuyunZhang, ChangminZhong, ChangLiu, JianPei, KotagiriRamamohanarao, JinjunChena, "A spatiotemporal compression-based approach for," J Comput Syst Sci 80(8):1563–1583, 2014.

16. S. W. Kareem, Hybrid Public Key Encryption Algorithms for E-Commerce., Erbil: University of Salahaddin – Hawler, 2009.

17. Roojwan S. Ismael, Rami S. Youail, Shahab Wahhab Kareem, "Image Encryption by Using RC4 Algorithm," EUROPEAN ACADEMIC RESEARCH, vol. Vol. II, no. Issue 4, pp. 5833-5839, 2014.

18. Anna Monreale, Wendy Hui Wang, Francesca Pratesi, Salvatore Rinzivillo, Dino Pedreschi, Gennady Andrienko, Natalia Andrienko " Privacy-preserving distributed movement data," In: Geographic information science at the heart of Europe., Springer, 2013, pp 225–245.

19. A. M. Jalali B, "The anamorphic stretch transform: putting the big data," Opt Photonics News 25(2):24–31, 2014.

20. Shadan M. Abdalwahid, Raghad Z. Yousif, Shahab W. Kareem, "Enhancing Approach for Information Security in Hadoop," Polytechnic Journal, vol. 10, no. 1, pp. 81-87, 2020.

21. Shahab Wahhab KAREEM Firas ALMUKHTAR, Nawzad MAHMOODD, "SEARCH ENGINE OPTIMIZATION: A REVIEW," Journal of Applied Computer Science, vol. 17, no. 1, pp. 69-79, 2021.

22. Shahab Wahhab Kareem, Mehmet Cudi Okur. (2. 11 2020). Structure Learning of Bayesian Networks Using Elephant Swarm Water Search Algorithm. International Journal of Swarm Intelligence Research, S. 19-30.

23. Fred Douglis,Kai Li,Hugo Patterson,Sazzala Reddy,Philip Shilane ,"Tradeoffs in scalable data routing for deduplication," In: FAST, 2011.

24. E. K. M. P. Bhagwat D, "Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus," In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 2007.

25. Wen Xia,Hong Jiang,Dan Feng,Yu Hua, "SiLo: a similarity-locality based near-exact deduplication," . In: USENIX annual technical conference, 2011.

26. Shahab Wahhab Kareem and Mehmet Cudi Okur. (2020). Evaluation of Bayesian Network Structure Learning Using Elephant Swarm Water Search Algorithm. In S. C. Shi, Handbook of Research on Advancements of Swarm Intelligence Algorithms for Solving Real-World Problems (S. 139-159). Chapter 8: IGI Global.

27. Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar,Salimah Mokhtar, Abdullah Gani, Samee Ullah Khan, The rise of ‘‘big data’’ on cloud computing: review, Inf Syst 47:98–115 , 2015.

28. Abdullah Gani, Aisha Siddiqa,Shahaboddin Shamshirband,Fariza Hanum, A survey on indexing techniques for big data: taxonomy and performance evaluation, In: Knowledge and information systems, pp 1–44, 2015.

29. Karthik Kambatla,Giorgos Kollias,Vipin Kumar,Ananth Grama, Trends in big data analytics, J Parallel Distrib Comput 74(7):2561–2573, 2014.

30. Joseph Salmon, Zachary Harmany, Charles-Alban Deledalle, Rebecca Willett “Poisson noise reduction with non-local PCA”, Journal of mathematical imaging and vision48(2), 2014.

31. C. C. Aggarwal, Outlier analysis, Paper presented at the Data Mining, 2015.

32. Basiroh, Priyatno,Shahab W. Kareem, Heri Nurdiyant “Analysis of Expert System for Early Diagnosis of Disorders During Pregnancy Using the Forward Chaining Method”, International Journal Of Artificial Intelegence Research, Vol 5, , No 1, 2021, pp.44-52, DOI: 10.29099/ijair.v5i1.203

33. Amin Salih Mohammed, Shahab Wahhab Kareem, Ahmed khazal al azzawi,Dr.M. Sivaram, "Time Series Prediction Using SRE- NAR and SRE- ADALINE," Jour of Adv Research in Dynamical & Control Systems, pp. 1716- 1726, 2018.

34. shadan MohammedJihad abdalwahid Shahab Wahhab Kareem, Raghad Zuhair Yousif, "An approach for enhancing data confidentiality in Hadoop," Indonesian Journal of Electrical Engineering and Computer Science, vol. 20, no. 3, pp. 1547-1555, 2020.

35. Abdalwahid , S. M. J. ., Ismael, S. ., & Shahab Wahhab kareem. “Pre-Cancer Diagnosis via TP53 Gene Mutations Applied Ensemble Algorithms”, Technium BioChemMed, Vol 4, No.2 2021.

36. Shahab Wahhab Kareem, Mehmet Cudi Okur. (4. 20 2019). Pigeon Inspired Optimization of Bayesian Network Structure Learning and a Comparative Evaluation. Journal of Cognitive Science, S. 535-552.

37. Shavan Askar, Zhala Jameel Hamad, Shahab Wahhab Kareem. (6. 5 2021). Deep Learning and Fog Computing: A Review. International Journal of Science and Business, S. 197-208.

38. Deanne Larson, Victor Chang, “A review and future direction of agile, business intelligence, analytics and data science, International Journal of Information Management, 36(5), 700-710, 2016.

39. Shavan Askar, Kosrat Dlshad Ahmed, Shahab Wahhab Kareem. (6. 5 2021). Deep learning Utilization in SDN Networks: A Review. International Journal of Science and Business, S. 174-182.

40. Aparna Kumari, Sudeep Tanwar, Sudhanshu Tyagi, Neeraj Kumar, Michele Maasberg, Kim-Kwang Raymond Choo, Big data (lost) in the cloud, Int J Big Data Intell 1(1–2):3–17, 2014.

41. B. E. Zerbino “Velvet: algorithms for de novo short read assembly using de Bruijn graphs”, Genome Res 18(5):821–829, 2008.

42. Min-Sheng Lin, Chien-Yi Chiu, Yuh-Jye Lee and Hsing-Kuo Pao, “Malicious URL filtering—a big data application”, In 2013 IEEE international conference on big data, 2013.

43. C. K. P. F. Dredze M, “Confidence-weighted linear classification”, In: Proceedings of the 25th international conference on machine learning, 2008.

44. Koby Crammer,Ofer Dekel,Joseph Keshet,Shai Shalev-Shwartz,Yoram Singer, “Online passive-aggressive algorithms”, J Mach Learn Res 7:551–585, 2006.

45. Chris Hillman, Yasmeen Ahmad,Mark Whitehorn, and Andy Cobley, Near real-time processing of proteomics data using Hadoop, Big Data 2(1):44–49, 2014.

46. R. Sugumaran, J. Burnett, and A. Blinkmann, “Big 3d spatial data processing Big 3d spatial data processing”, In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data, 2012.

Back to the journal content
Creative Commons License
This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Home | Editorial Board | Author info | Archive | Contact
Copyright JACSM 2007-2023