MapR: HBase Applications and Design Build

Book Now

Overview

Learn how to architect and write HBase programs using Hadoop as a distributed NoSQL datastore. This course introduces HBase architecture, the HBase data model, and the most important APIs for writing programs. The course also introduces schema design, performance tuning, bulk-loading of data, and storing complex data structures.

Duration

3 days

Who is the course for

  • For developers interested in designing and developing HBase applications.
  • This is a programming course; you must have Java programming experience to do the exercises.

Prerequisites

Required:

  • Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp
  • Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP Windows)
  • Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE such as Eclipse

Optional:

  • Basic Hadoop and database knowledge

What you will learn

Day 1

Introduction to HBase

  • Differentiate between RDBMS and HBase
  • Identify typical HBase use cases

HBase Data Model

  • Describe the HBase data model and data model components
  • Describe how logical data model maps physical storage on disk
  • Use data model operations
  • Create an HBase table

HBase Architecture

  • Identify the components of an HBase cluster
  • Describe how the HBase components work together
  • Describe how regions work and their benefits
  • Define the function of minor and major compactions
  • Describe Region Server splits
  • Describe how HBase handles fault tolerance
  • Differentiate MapR-DB from HBase

Basic Schema Design

  • List the elements of schema design
  • Design row keys for data access patterns
  • Design table shape and column families for data access patterns
  • Define column family properties
  • Design schema for given scenario

Design Schemas for Complex Data Structures

  • Transition from relational model to HBase
  • Use intelligent keys
  • Use secondary indexes or lookup tables
  • Design for other complex data structures
  • Evolve schemas over time

Using Hive to Query HBase

  • Use Hive to query HBase/MapR tables

Day 2

Java Client API Part 1

  • Define the CRUD operations from the Hbase Java API and discuss when and how to use them
  • Get, Put, Delete, Scan
  • Describe the data flow between Client and Server when using these APIs
  • Define the various helper classes for these APIs: KeyValue, Result, ResultScanner (Scan)
  • Lab on Java Client API Get, Put, Delete, Scan: Use these APIs to create an application

Java API Part 2

  • Client-side write buffer
  • HTable batch operations
  • checkAndPut: atomic put operation
  • KeyValue, Result Objects
  • Atomic put with checkAndPut
  • Lab on Java Client API HTable Batch, checkAndPut
  • Use HTable Batch APIs in an application
  • Use HTable checkAndPut APIs for row transactions in an application

Java Client API for Administrative Features

  • HTable descriptor
  • HColumn descriptor
  • HBaseAdmin
  • Lab: Create Tables and Define Properties using the HBaseAdmin Java interface

Day 3

Advanced HBase Java API

  • Filters
  • Counters
  • Lab
  • Using filters in an application
  • Using counter-increment for row transactions in an application

Time Series Application with Flat Wide and Tall Narrow Implementations

  • Explanation of time series application implementation
  • Lab: Programming a Time Series Application

MapReduce on HBase

  • How is MapReduce used on HBase?
  • How to program MapReduce applications for HBase
  • Lab: Reading from HBase and writing Back Daily Statistics

Social Application

  • Explanation of social application implementation
  • Lab: Programming a Social Application

Bulk Loading of Data

  • Using the importtsv bulk load tool
  • Use MapReduce job to import data
  • Lab: Using Importtsv and MapReduce to Load from a File into HBase

Performance

  • Performance considerations
  • Monitoring
  • Benchmarking
  • Lab: YCSB Benchmarking

Security

  • Authentication, authorization, auditing, encryption
  • Access Control Expressions, roles, permissions
  • Lab: Tables Authorization

Format

  • 50% Lecture
  • 50% Labs

Related Training Courses

MapR: Hive and Pig This 2-day course covers how Hive emulates SQL in a Hadoop cluster, dataflow languages and how to create efficient data flows using Pig.

MapR: Developing Hadoop Applications This 3-day course provides instruction on how to write Hadoop application using MapReduce and YARN in Java.

MapR: HBase Applications and Design Build This 3-day course introduces the concepts of NoSQL technologies, HBase architecture, schema design, performance tuning, bulk-loading of data and the storing of complex data structures.

 

No Events on The List at This Time

X