Search Ebook here:


Optimizing Hadoop for MapReduce



 PDF

Author: Khaled Tannir

Publisher: Packt Publishing

Genres:

Publish Date: February 17, 2014

ISBN-10: 1783285656

Pages: 120

File Type: Epub, Mobi, Pdf

Language: English

read download

Book Preface

MapReduce is an important parallel processing model for large-scale, data-intensive applications such as data mining and web indexing. Hadoop, an open source implementation of MapReduce, is widely applied to support cluster computing jobs that require low response time.

Most of the MapReduce programs are written for data analysis and they usually take a long time to finish. Many companies are embracing Hadoop for advanced data analytics over large datasets that require time completion guarantees. Efficiency, especially the I/O costs of MapReduce, still needs to be addressed for successful implications. The experience shows that a misconfigured Hadoop cluster can noticeably reduce and significantly downgrade the performance of MapReduce jobs.

In this book, we address the MapReduce optimization problem, how to identify shortcomings, and what to do to get using all of the Hadoop cluster’s resources to process input data optimally. This book starts off with an introduction to MapReduce to learn how it works internally, and discusses the factors that can affect its performance. Then it moves forward to investigate Hadoop metrics and performance tools, and identifies resource weaknesses such as CPU contention, memory usage, massive I/O storage, and network traffic.

This book will teach you, in a step-by-step manner based on real-world experience, how to eliminate your job bottlenecks and fully optimize your MapReduce jobs in a production environment. Also, you will learn to calculate the right number of cluster nodes to process your data, to define the right number of mapper and reducer tasks based on your hardware resources, and how to optimize mapper and reducer task performances using compression technique and combiners.

Finally, you will learn the best practices and recommendations to tune your Hadoop cluster and learn what a MapReduce template class looks like.

What this book covers

Chapter 1, Understanding Hadoop MapReduce, explains how MapReduce works internally and the factors that affect MapReduce performance.

Chapter 2, An Overview of the Hadoop Parameters, introduces Hadoop configuration files and MapReduce performance-related parameters. It also explains Hadoop metrics and several performance monitoring tools that you can use to monitor Hadoop MapReduce activities.

Chapter 3, Detecting System Bottlenecks, explores Hadoop MapReduce performance tuning cycle and explains how to create a performance baseline. Then you will learn to identify resource bottlenecks and weaknesses based on Hadoop counters.

Chapter 4, Identifying Resource Weaknesses, explains how to check the Hadoop cluster’s health and identify CPU and memory usage, massive I/O storage, and network traffic. Also, you will learn how to scale correctly when configuring your Hadoop cluster.

Chapter 5, Enhancing Map and Reduce Tasks, shows you how to enhance map and reduce task execution. You will learn the impact of block size, how to reduce spilling records, determine map and reduce throughput, and tune MapReduce configuration parameters.

Chapter 6, Optimizing MapReduce Tasks, explains when you need to use combiners and compression techniques to optimize map and reduce tasks and introduces several techniques to optimize your application code.

Chapter 7, Best Practices and Recommendations, introduces miscellaneous hardware and software checklists, recommendations, and tuning properties in order to use your Hadoop cluster optimally.


Download Ebook Read Now File Type Upload Date
Download here Read Now Epub, Mobi, Pdf May 30, 2020

How to Read and Open File Type for PC ?