Databricks: Spark Development Bootcamp

Book Now

Overview

This 3-day hands-on workshop will introduce you to Apache Spark with coding exercises and lectures. Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of of nodes.

In this class, you will learn how to build and manage Spark applications using Spark’s core programming APIs and its standard Libraries. You will receive a free Databricks account for the duration of training.

Duration

3 days

Who is the course for

Engineers, Data Scientists, and Analysts

Prerequisites

Students should arrive to class with:

  • A basic understanding of software development
  • Some experience coding in Python, Java, SQL, or Scala
  • A laptop with a modern operating system (Windows, OS X, Linux), browser (Internet Explorer not supported), and Internet access

What you will learn

After taking this class you will be able to:

  • Build a data pipeline using Spark DataFrames and Spark SQL
  • Understand Spark concepts, architecture, and applications
  • Execute SQL queries on large scale data using Spark
  • Explore and visualize your data by entering and running code in Notebooks
  • Train, and use an ML model on real data with Spark’s Machine Learning library MLlib
  • Tune Spark job performance and troubleshoot errors using logs and administration UIs
  • Find answers to common questions using Spark documentation and discussion forums
  • Write and monitor a Spark Streaming job to analyze data with sub-second latency
  • Understand common use-cases and business applications of Spark
  • Recognize all of the topics tested by the Spark Developer Certification and know what further work is required to prepare to take and pass the exam

Course Outline

Day 1

  • History of Big Data & Apache Spark
  • Introduction to the Spark Shell and the training environment
  • [Optional] Just enough Scala for Spark + Intro to Spark DataFrames and Spark SQL
  • Introduction to RDDs
    • Lazy Evaluation
    • Transformations and Actions
    • Caching
    • Using the Spark UIs

Day 2

  • Data Sources: reading from Parquet, S3, Cassandra, HDFS, and your local file system
  • Spark’s Architecture
  • Programming with Accumulators and Broadcast variables
  • Debugging and tuning Spark jobs using Spark’s admin UIs
  • Memory & Persistence

Day 3

  • Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)
  • Visualization: matplotlib, gg_plot, dashboards, exploration and visualization in notebooks
  • Introduction to Spark Streaming
  • Introduction to MLlib and GraphX

Format

  • 50% Lecture
  • 50% Labs

Related Training Courses

Databricks: Spark Essentials ​This 1-day overview course is a guided hands-on tour of Spark, a popular tool for Big Data analytics with a unified API for batch analytics, SQL queries, stream processing, machine learning, and graph analysis.

 

may

23 - 25maymay 239:00 ammay 25Spark Development BootcampLondon, UK9:00 am - 5:00 pm (25) GMT Teradata UK

X