Published April 30, 2023
| Version v1
Journal article
Open
EFFICIENT PARALLEL DATA ANALYSIS: INTEGRATING MAPREDUCE WITH HADOOP DISTRIBUTED FILE SYSTEM
- 1. Sharof Rashidov nomidagi Samarqand davlat universiteti
Description
The necessity for effective algorithms for data processing in parallel databases has grown critical in the current era of big data. The purpose of this research is to build an effective algorithm for data analysis in parallel databases. To rapidly analyze massive data sets in parallel, the proposed approach integrates the MapReduce programming model with the Hadoop distributed file system. The algorithm was tested on a real-world dataset, and the findings indicated that it outperformed existing algorithms in terms of execution speed and scalability.
Files
840-842.pdf
Files
(581.1 kB)
Name | Size | Download all |
---|---|---|
md5:1cc894f1bf227af3110d1b6ec3c045b1
|
581.1 kB | Preview Download |
Additional details
Related works
- Is cited by
- Journal article: 10.5281/zenodo.7904887 (DOI)
References
- 1. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113. 2. White, T. (2015). Hadoop: The definitive guide (4th ed.). O'Reilly Media. 3. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., ... & Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2-2). 4. Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1-10). IEEE. 5.Borthakur, D. (2007). HDFS architecture guide. The Apache Hadoop project. Retrieved from https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html 6. Jindal, N. (2016). Big data processing with Apache Spark. Packt Publishing Ltd. 7. Matei Zaharia, Apache Spark: A Unified Engine for Big Data Processing, Communications of the ACM, Volume 59 Issue 11, November 2016, Pages 56-65, ISSN 0001-0782. 8. Venkataraman, S., Boden, N., Muthukumaran, K., Stoica, I., & Zaharia, M. (2016). Scaling distributed machine learning with the parameter server. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI'16) (pp. 583-597).