MapR: Developing Hadoop Applications

Book Now

Overview

This course teaches developers how to write Hadoop applications using MapReduce and YARN in Java. The course covers debugging, managing jobs, improving performance, working with custom data, managing workflows, and using other programming languages for MapReduce.

Duration

3 days

Who is the course for

Developers and Programmers interested in developing applications on Hadoop (MapReduce). This is programming course – attendees must have Java programming experience.

Prerequisites

Attendees should have:

  • Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE
  • Basic Hadoop knowledge — helpful but not required.
  • Connected to a Hadoop cluster via SSH and web browser
  • Database concepts is helpful but not required

What you will learn

Write a MapReduce Program

  • Summary of the programming problem
  • Design and implement the Mapper class, Reducer class and driver
  • Build and execute the code then examine the output
  • Describe data set for programming problem
  • Hands-on Exercises

Use the MapReduce API

  • API overview
  • Mapper input processing and Reducer output processing data flow
  • Explore the Mapper, Reducer and Job class API
  • Hands-on Exercises

Managing, monitoring, and testing MapReduce jobs

  • Work with counters
  • Use the MCS to monitor jobs
  • Use the Hadoop CLI to manage jobs
  • Display job history and logs
  • Write unit tests for MapReduce programs
  • Hands-on Exercises

Characterizing and improving MapReduce job performance

  • Learn components of MapReduce performance
  • Enhance performance in your MapReduce jobs
  • Overview of MapR performance enhancements
  • Hands-on Exercises Working with different data sources in MapReduce
  • Work with sequence files
  • Working with the distributed cache
  • Working with HBase
  • Hands-on Exercises

Managing multiple MapReduce jobs

  • Different approaches to launching multiple MapReduce jobs
  • Implement programmatic job control in the driver
  • Use MapReduce chaining
  • Use Oozie to manage MapReduce workflows
  • Hands-on Exercises

Using MapReduce streaming

  • Overview of the MapReduce streaming paradigm
  • Configure MapReduce streaming parameters
  • Define the programming contract for mappers and reducers
  • Monitor and debug MapReduce streaming jobs
  • Hands-on Exercises

Format

  • 50% Lecture
  • 50% Labs

Related Training Courses

MapR: Hadoop Operations: Cluster Administration This 3-day course is designed to teach Hadoop administrators how to install, configure and maintain a MapR Hadoop cluster.

MapR:HBase Applications and Design Build This 3-day course introduces the concepts of NoSQL technologies, HBase architecture, schema design, performance tuning, bulk-loading of data and the storing of complex data structures.

MapR: Hive and Pig This 2-day course covers how Hive emulates SQL in a Hadoop cluster, dataflow languages and how to create efficient data flows using Pig.

 

X