This is a fast-paced, vendor agnostic, technical course that focuses on the key aspects of the technology for developers and system operations staff, covering core internal and distributed architecture fundamentals of Apache Cassandra, including Spark / Cassandra integration.
Who is the course for
Developers, System Operations staff, Software Engineers, Data Scientists, Network Engineers, Technology Managers.
No prior knowledge of databases or programming is assumed, although having some basic experience with relational/SQL databases and Java will help.
What you will learn
- Identifying the correct use cases for Cassandra
- Introducing attendees to the core concepts of the distributed architecture of the Cassandra database
- Deep diving into the internal architecture of the read/write paths of Cassandra: bloom filters, block indexes, commit-log, memtables, sstables, compaction, etc.
- The fundamentals of how to write Java code to interact with Cassandra
- Data modelling using CQL using the newest features of Cassandra 2.x and how to apply these concepts to build real applications on top of Cassandra
- Spark/Cassandra integration
Introduction to Cassandra and its Architecture
- NoSQL ecosystem overview
- Review of Database families and data models
- Cassandra origins: Amazon Dynamo, Google BigTable and Cassandra at Facebook
- Cassandra use cases
- Cassandra ecosystems and distributions
- Cassandra distributed architecture fundamentals
- Cassandra configuration
Cassandra Storage Internals and Data Model
- Introduction to LSM-tree
- Implementation details: bloom filters, in-memory caching, compression, off-heap data structures
- Detailed study of the read/write path
- JVM tuning and troubleshooting
Data Modelling using CQL in Cassandra
- CQL language fundamentals
- CQL use cases
- Mapping between logical CQL data model and internal low level storage engine
- CQL collections
- Atomic batches
- Spark Overview – Introduction – What is Spark?
- Spark compared to Hadoop
- Use Case Showcase (Teradata / Netflix)
- Architecture related to Cassandra (HA Cassandra via DSE etc.)
- Why? Topic will give an intro to Spark and why it works well with Cassandra. A high level overview of how the two technologies work well with each other will help developers understand why Spark and Cassandra were designed in such a way that allows them to scale horizontally.
- Accessing Cassandra from a Spark application
- Configuration (Specific to Cassandra)
- Why? The above topic will showcase code and a guide as to how a Java / Scala (yet to be decided) application can connect to Cassandra and access the scalable storage system.
Spark Best Practices
- When to use Spark
- Common Anti-patterns
- Why? The topic will show some patterns that can help avoid hindering performance as well as important configuration settings to allow users to squeeze the maximum performance from their hardware.
- Each student will be given a 3 node Cassandra cluster in Rackspace to run through the hands-on labs
- Lab 1: Install Cassandra 2.x on a single node in the cloud
- Lab 2: Run Cassandra commands and explore operations management concepts
- Lab 3: Grow the cluster size to 3 nodes
- Lab 4: Advanced Cassandra commands
- Lab 5: Java API lab
- Lab 6: Advanced Java API lab
- Lab 7: Spark Integration lab
- 50% Lectures
- 50% Labs
The course content can be customised to cover any specialised material you may require for your specific training needs.
For more information, please contact us at [email protected]
Related Training Courses
HDP Analyst: Apache Hbase Essentials This 2-day workshop introduces HBase basics, structure and operations in an intensely hands-on experience.
Big Data Concepts This one-day class is an executive briefing on big data designed for senior management and business leaders to learn about big data concepts and familiarise themselves with the business and technology trends and opportunities.
Apache Hadoop Essentials This course is designed to help attendees understand the concepts and benefits of Apache Hadoop and how it can help them meet their business goals.
Machine Learning with Apache Hadoop This course is designed to help attendees understand the high-level concepts and classifications of machine learning systems with a strong focus on building Recommender Systems.