EFFICIENT PARALLEL DATA ANALYSIS: INTEGRATING MAPREDUCE WITH HADOOP DISTRIBUTED FILE SYSTEM

Usmon Ramazonovich Shodiyev; Ziyodullo Abdurayim o‘g‘li Malikov

Authors

Usmon Ramazonovich Shodiyev Sharof Rashidov nomidagi Samarqand davlat universiteti
Ziyodullo Abdurayim o‘g‘li Malikov Sharof Rashidov nomidagi Samarqand davlat universiteti

Keywords:

Distributed computing, big data, parallel processing, MapReduce, Hadoop, algorithm, data analysis, efficiency, scalability.

Abstract

The necessity for effective algorithms for data processing in parallel databases has grown critical in the current era of big data. The purpose of this research is to build an effective algorithm for data analysis in parallel databases. To rapidly analyze massive data sets in parallel, the proposed approach integrates the MapReduce programming model with the Hadoop distributed file system. The algorithm was tested on a real-world dataset, and the findings indicated that it outperformed existing algorithms in terms of execution speed and scalability.

References

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

White, T. (2015). Hadoop: The definitive guide (4th ed.). O‘Reilly Media.

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., ... & Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2-2).

Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1-10). IEEE.

Borthakur, D. (2007). HDFS architecture guide. The Apache Hadoop project. Retrieved from https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

Jindal, N. (2016). Big data processing with Apache Spark. Packt Publishing Ltd.

Matei Zaharia, Apache Spark: A Unified Engine for Big Data Processing, Communications of the ACM, Volume 59 Issue 11, November 2016, Pages 56-65, ISSN 0001-0782.

Venkataraman, S., Boden, N., Muthukumaran, K., Stoica, I., & Zaharia, M. (2016). Scaling distributed machine learning with the parameter server. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI’16) (pp. 583-597).

EFFICIENT PARALLEL DATA ANALYSIS: INTEGRATING MAPREDUCE WITH HADOOP DISTRIBUTED FILE SYSTEM

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information