The Convergence of Multimedia and Artificial Intelligence: Innovations in Drones, Audio and Video Production

SFICHI Stefan; BALAN Ionut

doi:10.4316/JACSM.202401005

Paper title:	The Convergence of Multimedia and Artificial Intelligence: Innovations in Drones, Audio and Video Production
DOI:	https://doi.org/10.4316/JACSM.202401005
Published in:	Issue 1, (Vol. 18) / 2024
Publishing date:	2024-11-15
Pages:	37-43
Author(s):	SFICHI Stefan, BALAN Ionut
Abstract.	The rapid advancement of artificial intelligence (AI) has revolutionized the multimedia industry, enabling unprecedented capabilities in drones, audio, and video production. This paper explores the integration of AI-driven technologies within these domains, highlighting their impact on automation, content creation, and real-time processing. AI-powered drones are transforming aerial cinematography with autonomous flight and intelligent scene analysis, while AI algorithms are enhancing audio and video production through automated editing, noise reduction, and content personalization. The study provides an overview of current trends, challenges, and future prospects in leveraging AI to push the boundaries of multimedia production and distribution.
Keywords:	Artificial Intelligence (AI), Multimedia Technology, Drones, Audio Production, Video Production, Autonomous Systems, AI-driven Content Creation, Machine Learning In Multimedia, Real-time Processing, Intelligent Automation, Computer Vision, Deep Learning, Aerial Cinematography, Signal Processing, Content Personalization
References:	[1]. Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Gamal, M. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(1), 53. [2]. Anantrasirichai, N., Bull, D. (2020). Artificial intelligence in the creative industries: A review. Artificial Intelligence Review, 55(1), pp. 589–656. [3]. Barbu, T. (2023). CNN-Based Temporal Video Segmentation Using a Nonlinear Hyperbolic PDE-Based Multi-Scale Analysis, Mathematics 2023, 11(1), 245 [4]. Bonatti, R., Wang, W., Ho, C., Ahuja, A., Gschwindt, M., Camci, E., Kayacan, E., Choudhury, S., Scherer, S. (2020). Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making, Journal of Field Robotics, 37(4), pp. 606-641 [5]. Gao, L., Fan, R., Zhang, Y., Li, Y. (2017). Learning in high-dimensional multimedia data: The state of the art. Multimedia Systems. [6]. Gomez-Uribe, C. A., Hunt, N. (2015). The Netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems, 6(4), 13, pp. 1-19. [7]. Huang, H.-I., Shih, C.-S., Yang, Z.-L. (2022). Automated video editing based on learned styles using LSTM-GAN, SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, pp. 73 – 80 [8]. Jeon, G. (2020). Artificial intelligence in deep learning algorithms for multimedia analysis. Multimedia Tools and Applications, Springer, 79, pp. 34129-34139. [9]. Luo, Y., Han, C., Mesgarani, N., Ceolini, E., Liu, S. C. (2019). FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) [10]. Mazzone, M., Elgammal, A. (2019). Art, creativity, and the potential of artificial intelligence. Arts, 8(1), 26. [11]. Nageli, T., Meier, L., Domahidi, A., Alonso-Mora, J., Hilliges, O. (2017). Real-time planning for automated multi-view drone cinematography. ACM Transactions on Graphics (TOG), 36(4), 132, pp.1 – 10, https://doi.org/10.1145/3072959.3073712 [12]. Othman, I. (2023). AI Video Editor: A Conceptual Review in Generative Arts, Proceedings of the 3rd International Conference on Creative Multimedia 2023 (ICCM 2023) [13]. Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications, and research directions. Journal of King Saud University-Computer and Information Sciences. [14]. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems, 28. [15]. Su, J., Jin, Z., Finkelstein, A. (2020). HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks, https://doi.org/10.48550/arXiv.2006.05694 [16]. Sun, Q., Yu, Q., Cui, Y., Zhang, F., Zhang, X., Wang, Y., Gao, H., Liu, J., Huang, T., Wang, X. (2024). Emu: Generative Pretraining in Multimodality, https://doi.org/10.48550/arXiv.2307.05222 [17]. Zhao, Y. (2024). The synergistic effect of artificial intelligence technology in the evolution of visual communication of new media art. Heliyon, 10(18).
Back to the journal content

This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.