This course provides an introduction to the concept of data lakes, and how Apache NiFi and Kylo™ may be used to develop and administer your data lake. Students can expect to learn how to construct data flows in Apache NiFi, and how to use Kylo™ to quickly and easily ingest data into, and wrangle data inside your data lake.
Who is the Kylo™ training course for?
Software Engineers, Data Scientists and Analysts
The following prerequisites ensure that students gain the maximum benefit from the course.
- Programming experience: although this is an introductory course, a hands-on understanding of core technologies such as HDFS, Hive, and the Linux command line are helpful
- Linux shell and editor experience (recommended): basic Linux shell (bash) commands will be used extensively, and students will need to do some text editing using vi or emacs during the course
- Experience with SQL databases: students will find SQL experience useful for understanding Hive queries from Kylo™, but not essential.
- Laptops: students should bring either a Mac or Windows laptop to the course with Safari or Chrome Web browsers installed. Windows users should download and install the gitbash program as well to allow ssh cluster access.
What you will learn
Think Big Academy courses teach by doing, where short lectures and hands-on exercises are interspersed. By the end of the course, students will learn the following:
- Introduction to Data Lake concepts
- Introduction to Apache NiFi and the problems it solves
- Introduction to Kylo™ and the problems it solves
- How to use the Apache NiFi user interface
- How to create Apache NiFi data flows, process groups and templates
- How to automate data ingestion using Apache NiFi and Kylo™
- How to wrangle data using Kylo™
- Common pitfalls and problems, and how to avoid them
Day 1 – Data Lakes and Apache NiFi:
Data Lakes are becoming critical components in enterprises adopting Hadoop and other open source distributed computing solutions. Students will learn about both Apache NiFi and Kylo™ in relation to data lakes. Students will also learn how to build data flows using Apache NiFi.
- What is a Data Lake? Why do I need one?
- Apache NiFi overview and capabilities, and demo
- Apache NiFi in practice; Hands-on lab
- Starting and stopping
- GUI tour
- Adding and configuring processors
- Connecting processors
- Starting and stopping processors
- Available processors
- Data ingestion
- Attribute extraction
- Data transformation
- Building a sample flow
- Expression Language
- Process groups and ports
Day 2 – Kylo™
Kylo™ builds on Apache NiFi to provide a fully featured Data Lake solution. Students will learn about the relationship between Apache NiFi and Kylo™. Students will also learn how to ingest data into a data lake using Kylo™, and how to wrangle data and build transformation feeds.
- Architecture: Apache NiFi and Kylo™
- Concept of creating models with Apache NiFi and registering them with Kylo™
- Kylo™ in practice; Hands-on lab:
- Registering a NiFi template with Kylo™
- Building a data ingestion feed
- Data confidence feeds and SLAs
- Wrangling data and transformation feeds
- Importing and exporting templates
- Design best practices
- Access control
50% Hands-on Labs